Transducer infrastructure

The main lex file            Separate lex files for different POS (parts
                                                               of speech)
|----------------------|    |------------------|
|     sme-lex.txt      |    | noun-sme-lex.txt |
|                      |    |  viessu GOAHTI ; |  From the Root lexicon, there
|     Root   -------------> |  ...             |  are pointers to each POS.
|                      |    |        |         |  The files for nouns, verbs and
|     LEXICON GOAHTI <---------------|         |  adjectives point back to the
|      +N DEVNVCASE ;  |    |                  |  sme-lex.txt file, and are di-
|      ...             |    |------------------|  rected to their respective
|                      |                          sublexica.
|                      |    |-------------------|
|                     --->  | verb-sme-lex.txt  | (the auxiliary verbs are
|                   <--------- ...              | also found in the verb file)
|                      |    |-------------------|
|                      |
|                      |    |-------------------|
|                     --->  | adj-sme-lex.txt   |
|                   <--------- ...           |
|                      |    |-------------------|
|                      |
|                      |
|                     --->  |-------------------| The other lex files contain
|                  <- - - - - closed-sme-lex.txt| closed classes. They are smal
|----------------------|    | LEXICON Pronoun   | ler, and all the sublexica
                            |  Personal ;       | are in the same file, not in
                            |                   | the sme-lex file (well, some
                            | LEXICON Personal  | point to some sme-lex sub-
                            |  ...              | lexica). Other files are pp-
                            |-------------------| lex.txt, etc. All in all
                                                  there are ca. 10 lex files.

This is compiled together with the        ||
twol rules. These rules contain the       ||
(morpho)phonological processes,           ||
consonant gradation, etc.                 \/


|------------|    |------------|      |------------|   The sme.save file is
|twol-sme.txt| => |twol-sme.bin|  =>  |  sme.save  |   compiled in lexc, and
|------------|    |------------|      |------------|   is the merger of the
                                                       lex files and the rule
Here are the      After compi-                         file twol-sme.bin
rules them-       lation in twolc           ||
selves            they are in this          ||
                  binary file               ||
                                            ||
Then comes preprosessor files:              \/

|----------|    |------------|         ||=========||  This is the final morpho-
|case.regex| => |caseconv.fst| ======> || sme.fst ||  logical parser for
|----------|    |------------|         ||=========||  North Sami.

The case.regex file is com-
piled in xfst. The preprocessor
itself, tok.fst, is not shown here.