Russian Morphological Transducer

The Russian Wiktionary gives two types of information about its words:

  1. The Zaliznjak number
  2. An inflectional template number.

According to Apertium experience, the Zaliznjak number in itself is not sufficient to generate all the paradigm forms. Instead, they use a script, ruwiktionaryextract.pl, which gives the output depicted in ruwiktionaryextractoutput.txt.

From the full paradigm given in the output, they find the maximal common suffix string, and make the paradigm patterns, here for nominals and verbs:

Note that the html files from which the paradigms are made contain the stress information.

Important for rusoahpa is to get an fst which is easy to maintain.

There now is an Apertium-style fst for Russian:

https://svn.code.sf.net/p/apertium/svn/incubator/apertium-rus/