docu-statusquo.eng

Status quo

This document gives a short status report of the Komi grammatical analyser  and its source files.

Juy 15th, 2016

Status for the source files:

  • Lexicon: The lexicon contains 35806 entries (19129 nouns, 12191 verbs, 4486 adjectives)
  • Morphology: The morphology files are 3494 lines long, and contain 479 continuation lexica. Compared to 8234 lines and 1309 lexica for Erzya, there is still work to do.
  • Morphophonology: The file kpv-phon.twolc is 253 lines long. Compared to the one for Erzya, 514 lines, the situation is not that bad for Komi.

Tasks ahead:

  1. Test and correct the morphology and morphophonology
  2. Integrate the Komi-Russian dictionary in the analyser (done)
  3. Add more words:
    1. Test the resulting analyser against text material, and add new words
    2. Systematically add Russian loanwords: names and technical terms (partly done)
  4. Work on the spellchecker
    1. TODO: Collect error corpus