Webdict Compilation
This text documents the compilation process of the web dictionaries.
For the moment, we use the Apertium dictionary format.
The files
The online files themselves are stored in the relevant catalogue for displaying
The conversion procedure
The files are converted from the original dictionary files,
You need the Apertium dixtools:
- download it from
wiki.apertium.org/wiki/Apertium-dixtools, and save it somewhere on your machine (outside the GTHOME tree) - install it via the following commands:
- cd apertium-dixtools/
- ant jar
- sudo ant install
- cd apertium-dixtools/
Then convert the Giellatekno xml format into Apertium xml format
(For an example of commands, see below)
- collect all relevant entries in a single file (this script does only that)
- in issue the following command (where INPUT_DIR is
../LANG1LANG2/src/): -
java -Xmx2048m -Dfile.encoding=UTF8 net.sf.saxon.Transform -it:main collect-dict-parts.xsl inDir=INPUT_DIR/ > PATH_TO_OUTPUT_FILE
- different filters for different language pair are possible/needed
- filtering takes place both in collect-dict-parts.xsl and
gtdict2simple-apertiumdix.xsl so before using these scripts check both your input and the code to assure the output you expect
- in issue the following command (where INPUT_DIR is
- convert the gt_format into the accepted apertium_xml format
- The output file from last command shall be the INPUT_FILE here.
Note that the lines between the first line and the <r> node must be deleted from the INPUT_FILE, otherwise the xsl script declares it "missing". - In the file gtdict2simple-apertiumdix.xsl, edit the variables inFile
and outDir, so that the former points to the input file and the latter to a directory we make for this. - Then, issue the command:
-
java -Xmx2048m -Dfile.encoding=UTF8 net.sf.saxon.Transform -it:main gtdict2simple-apertiumdix.xsl
- The output file from last command shall be the INPUT_FILE here.
- compile the file using the apertium tools (see above), with this command
( INPUT_FILE is last command's output):-
apertium-dixtools dix2trie INPUT_FILE lr LANG1-LANG2-lr-trie.xml
-
apertium-dixtools dix2trie INPUT_FILE lr LANG1-LANG2-lr-trie.xml
- update the file from the $GTHOME/apps/dicts/apertiumdict/ into the
relevant catalogue for displaying online files on gtweb.
Here comes an exampls, again assuming you stand in
java -Xmx2048m -Dfile.encoding=UTF8 net.sf.saxon.Transform -it:main collect-dict-parts.xsl inDir=../smefin/src > tull/out_simple-apertium/tull.xml
see tull/out_simple-apertium/tull.xml
THEN DELETE THE LINES BETWEEN THE FIRST LINE AND THE <r> NODE
java -Xmx2048m -Dfile.encoding=UTF8 net.sf.saxon.Transform -it:main gtdict2simple-apertiumdix.xsl
tail tull/ut/tull.xml
apertium-dixtools dix2trie tull/ut/tull.xml lr ../../../apps/dicts/apertium_dict/dics/fin-smn-lr-trie.xml

