150107
Contents:
Samest meeting january 7th 2014
Present: Heiki, Heli, Jaak, Kadri, Sjur, Tiina
Agenda
- fst
- Things to be done before the physical meeting (19.-21.01.)
fst
Compounding
Lexicalised compounds
Jaak has included the compounds needed by Oahpa into the fst. Generation of forms of these lexicalised compounds works fine for Oahpa now.
Problem: if we want to use the same lexicon for the hyphenation then we need the hash mark as the separator of the parts of the compounds.
Example of a lexicalised compound in other languages:
Dynamic compounds
Done by fst concatenation + filters to weed out overgeneration in Estonian. The system of filters is complex.
Problem with some illative forms
Some illative forms do not generate, e.g.
mees+N+Sg+Ill +?
This has happened as a result of suppressing some short illative (previously tagged as additive) forms, e.g. mees:*mehhe. Now mehesse does not generate either.
CG
Tiina has added the Estonian-specific scripts that include lists of pronouns, intransitive and partitive verbs etc.
To do before the meeting on 19.-21.01.
Heiki: describe the results of the comparison of Estonian and Finnish tags. Article on a linguistically motivated tag system. Multilingual, broader view? Discuss with Tommi Pirinen and "the Saami people" in Tromsø.
Jaak: check the illative forms (some are missing now) and ja- and mine-forms
Heli: Make Oahpa Morfa-S with full coverage of the "E nagu Eesti" dictionary available online (generate a new database).
Tiina: make further steps towards an Estonian CG that is working in gt infrastructure.
Heiki, Heli, Trond: Put together an agenda of the physical meeting and e-mail it to all the participants.
One topic for discussion: preprocess and determination of sentence boundaries (and clause boundaries) for Estonian. Compare EstNLTK (Python) with Giellatekno tools.