How To UseXML Files As Lexc Sources
.xml to .lexc generation in the main/langs/LANG/src/morphology/stems directory allows for reusing of lemma: stem: continuation information with other important dimensions of a given language. The same xml file might be used as a source file for the NDS reader's assistant, enabling storage of source language to target language word pairs for multiple target languages. It might also be used, at least in the initial work, with: ICALL (Oahpa environment); Constraint Grammar; Rule-bassed translation (Apertium), and text-to-speech applications.
If you want to utilize .xml to .lexc generation in the main/langs/LANG/src/morphology/stems directory, there are few items to bare in mind:
- both source .xml and target .lexc files must be declared in the main/langs/LANG/src/morphology/Makefile.am file
- no work should be done in main/langs/LANG/src/morphology/stems/...lexc files.
In the main/langs/LANG/src/morphology/stems/ directory, you may wish to change from simple .lexc work to generating these .lexc files from analogous .xml files. This will mean that comments should be moved out of the main/langs/LANG/src/morphology/stems/...lexc files and into perhaps the main/langs/LANG/src/morphology/root.lexc or the main/langs/LANG/src/morphology/affixes/...lexc files.
Once comments have been moved out of the main/langs/LANG/src/morphology/stems/...lexc files, we are ready to declare generation targets and sources in the
START
# Set this to the names of all other source lexc files GT_LEXC_SRCS=\ stems/abbr.lexc \ stems/acro.lexc \ stems/exceptions.lexc \ stems/punct.lexc \ stems/adjectives.lexc \ stems/adverbs.lexc \ stems/nouns.lexc \ stems/verbs.lexc \ affixes/adjectives.lexc \ affixes/adverbs.lexc \ affixes/nouns.lexc \ affixes/verbs.lexc # Set this to the names of all generated lexc files, if any GENERATED_LEXC_SRCS= # Set this to the names of all source xml files, if any GT_XML_SRCS=
END
The changes you want to make in the declaration will include the removal of the files:
stems/adjectives.lexc \ stems/adverbs.lexc \ stems/nouns.lexc \ stems/verbs.lexc \ from the GT_LEXC_SRCS=\ declaration, and their declaration in GENERATED_LEXC_SRCS=\
Please, note the commented new lines.
Additionally, the xml files must also be declared, resulting in the following:
START
# Set this to the names of all other source lexc files GT_LEXC_SRCS=\ stems/abbr.lexc \ stems/acro.lexc \ stems/exceptions.lexc \ stems/punct.lexc \ affixes/adjectives.lexc \ affixes/adverbs.lexc \ affixes/nouns.lexc \ affixes/verbs.lexc # Set this to the names of all generated lexc files, if any GENERATED_LEXC_SRCS=\ $(srcdir)/stems/adjectives.lexc \ $(srcdir)/stems/adverbs.lexc \ $(srcdir)/stems/nouns.lexc \ $(srcdir)/stems/verbs.lexc # Set this to the names of all source xml files, if any GT_XML_SRCS=\ stems/adjectives.xml \ stems/adverbs.xml \ stems/nouns.xml \ stems/verbs.xml
END
A few notes:
- all but the last file should have newline escaping, and experience indicates that you do not want intermittent files that are commented out
- the names of the .lexc files are generated from the source .xml file names, i.e. be sure you name your .xml files with that in mind.
As the .xml files become more extended and specific to your every need, you may choose to establish feeder .lexc files with simple lexc content. If you want to do this, there are various solutions to choose from. One used in kpv is the main/langs/LANG/src/morphology/stems/nouns_newwords.lexc file.
Remember to declare the new file both in GT_LEXC_SRCS= of the main/langs/LANG/src/morphology/Makefile.am file and the new lexicon in the main/langs/LANG/src/morphology/root.lexc file, perhaps N_NEWWORDS would be a working lexicon name.
So far the xml format has only been used by the AKU project. The xml files used in AKU make use of a mutual xsd description of the xml structure, and the xsl conversion features simple Finnish glossing in the generated .lexc source files.
Here are a couple of further points:
- the .xml files are checked against an .xsd schema
<r xsi:noNamespaceSchemaLocation="$GTHOME/giella-core/schemas/fiu-dict-schema.xsd" xml:lang="mrj" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:hfp="http://www.w3.org/2001/XMLSchema-hasFacetAndProperty" xmlns:fn="http://www.w3.org/2005/02/xpath-functions">
- as long as the files follow this schema, and the declarations are completed as illustrated above, just dropping them in the dir
Use of the xml format outside the AKU project may require some
- Features of the AKU project that may require adaptation are:
The xsd provides a possibility for semantic classes to be added within