120103

Komi dictionary

Presnt: Ciprian, Jack, Trond

Agenda

  • Status quo
  • Webdict
  • Fileformat

Status quo

  • All a), б) ... removed
  • The komrus is online on apertiumdict

Webdict

Inversion

Do we need rus-kom, fin-kom and eng-kom? I don't think so.

src>grep '___processing___' y_errors.txt | sort | uniq -c | sort -nr
3468 ___processing___ i
1806 ___processing___ gov
 624 ___processing___ register
 575 ___processing___ com
  93 ___processing___ field
  47 ___processing___ clarif
  38 ___processing___ sns
  20 ___processing___ range
  11 ___processing___ val
  11 ___processing___ att

We will have to update the about the dictionary files. Trond discusses that with Marina.

TODO

  • For fin-kom and eng-kom: yes, and both directions
  • For rus-kom: Cip will do a quick and-dirty version on thursday (if possible, if not we let it wait)
  • The 10372 kom/src/Not-V_kvru-lex.xml entries should not be added
  • Lingustic deadline: Wednesday 20.00 (21.00 Finnish time)

Non-Russian Cyrillic Unicode characters

We need input help for two special characters for the Komi alphabet (cf. webdicts for Kildin Sámi, North Sámi, etc.)? If so I need a list of the characters to offer as help.

Yes, we need two letters in addition to the Russian repertoire:

  • Unicode: ӧ, і (Ӧ, І)
  • x0406, x0456

Paula and komfin

Paula Kokkonen should edit Kom-Fin with XMLMind in $GTHOME/words/dicts/kom2X/src/

TODO

  • Add paula to svn (Trond)
  • Set up the machine
  • Teach Paula to use relevant programs (Jack, Trond)
    • XMLMind, XMLEditor
    • svn (either command line or the Versions.app)

Fileformat

Files

24th oct (???!!) lemma-translations-examples (dict) in private lemma-stem-contlex in working_files

Jack: Everything™ is in working_files. Delete private

Oct. Jack's recollection:

  1. lemma, stem, contlex, stem, content derived from komfineng,
  2. Derived content joined with kom-rus

kt/kom/src/Not-V_kvru-lex.xml

Principles

  1. All structure is xml structure
  2. text and nodes should not be sisters

All underlines in the lemma field be replaced by space.

<lemma>Войвывса кытшсайса</lemma> <!-- ws -->
<lemma>Войвывса_кытшсайса</lemma> <
<lemma>Войвывса% кытшсайса</lemma>