A flowchart over the sme files for morphological parsing
A flowchart over the sme files for morphological parsing
This flowchart gives an overview of how the sme sourcefiles are related. In principle, the other lg files are arranged in the same way.
The main lex file Separate lex files for different POS (parts of speech) |----------------------| |------------------| | sme-lex.txt | | noun-sme-lex.txt | | | | viessu GOAHTI ; | From the Root lexicon, there | Root -------------> | ... | are pointers to each POS. | | | | | The files for nouns, verbs and | LEXICON GOAHTI <---------------| | adjectives point back to the | +N DEVNVCASE ; | | | sme-lex.txt file, and are di- | ... | |------------------| rected to their respective | | sublexica. | | |-------------------| | ---> | verb-sme-lex.txt | (the auxiliary verbs are | <--------- ... | also found in the verb file) | | |-------------------| | | | | |-------------------| | ---> | adj-sme-lex.txt | | <--------- ... | | | |-------------------| | | | | | ---> |-------------------| The other lex files contain | <- - - - - closed-sme-lex.txt| closed classes. They are smal |----------------------| | LEXICON Pronoun | ler, and all the sublexica | Personal ; | are in the same file, not in | | the sme-lex file (well, some | LEXICON Personal | point to some sme-lex sub- | ... | lexica). Other files are pp- |-------------------| lex.txt, etc. All in all there are ca. 10 lex files. This is compiled together with the || twol rules. These rules contain the || (morpho)phonological processes, || consonant gradation, etc. \/ |------------| |------------| |------------| The sme.save file is |twol-sme.txt| => |twol-sme.bin| => | sme.save | compiled in lexc, and |------------| |------------| |------------| is the merger of the lex files and the rule Here are the After compi- file twol-sme.bin rules them- lation in twolc || selves they are in this || binary file || || Then comes preprosessor files: \/ |----------| |------------| ||=========|| This is the final morpho- |case.regex| => |caseconv.fst| ======> || sme.fst || logical parser for |----------| |------------| ||=========|| North Sami. The case.regex file is com- piled in xfst. The preprocessor itself, tok.fst, is not shown here.