151117

SamEst meeting 17.11.2015

Participants: Heiki, Heli, Jaak, Jack, Kadri

Agenda

  1. Estonian FST
  2. Estonian Oahpa
  3. Võro FST and Võro Oahpa
  4. Treebank gold standard corpus
  5. MT
  6. Next meeting

Estonian FST

Jaak: work in progress. Meanwhile, there have appeared new things in gt - for extracting lemmas from the dictionary etc. Some things in Estonian FST are working locally in Jaak's computer but have not been cheked in yet.

Requirement from MT: we need a mechanism for choosing an arbitrary form out of existing parallel forms. Later on, we can assign selection tags.

Requirement from Oahpa: We need a tag for non-preferred forms. There is a general tag +Use/NG (which is already being used in e.g. Skolt Saami and Võro). We tag by +Use/NG the forms that we do not want to appear as correct answers in Oahpa. As the first iteration it is sufficient for Oahpa.

But we also need some specific tags to distinguish the parallel forms for short illative vs -sse illative and i-plural vs -de-plural.

At the moment, we have a tag +Emph for pronouns like minule vs mulle, but this is not generalizable for all words.

Estonian Oahpa

Heli: Morfa-S verbs OK, first demo of Vasta-S working.

Morfa-C - a general improvement: no more repetitions within the same exercise set. (The same improvement has also been applied for Morfa-C in Võro Oahpa, and can be done for all other Oahpa languages.)

Links: Morfa-S verb Vasta-S

Heiki will make a presentation about the status of the project on Thursday. He will also show the working modules of Estonian and Võro Oahpa.

Võro FST and Oahpa

Current tasks:

  • Define the word class for the adjectives and substantives that are not sorted into adjectives / substantives in the dictionary. These are the words ending by -nõ. (Sulev)
  • Add comparative forms for adjectives. (Jack)
  • User interface for Adjectives in Morfa-S. (Heli)
  • Extend some semantic classes for Morfa-C. (Heli, Sulev)
  • Work on testing and correcting the FST. (Sulev, Jack)

Treebank gold standard corpus

Francis' idea was to translate the Turku Dep Treebank (which is annotated in Universal Dependency formalism). Kadri has taken 1700 sentences out of Turku Dep Treebank, Tiina has made the transformations (there were some inconsistencies with the GT Finnish morphology). There is a PhD student working on translation to Estonian and solving the inconsistencies. The analysis of the Estonian text is being done automatically and the result is manually checked and corrected by two people in parallel.

MT

Francis gave an intensive course on rule-based machine translation at Tartu University.

Finnish-Estonian MT has been included in the Apertium incubator. There are not so many errors left in Finnish and Estonian FST, tag conversion and bilingual dictionary. The example text translated by Jack is being automatically translated without major lexical and grammatical errors.

Here is an example text that has been automatically translated from Finnish to Estonian:

1        MILLES JAAKKO ON?
2        Jaakko ja Mari on aias. Ilm on täna hea, on väga sooja. Aga eile oli väga külma! Siis nad ei tohtinud mängida väljas. Jaakko ja Mari peavad väga mängimisest, nad mängivad alati koos aias suure maja ees.
3        Jaakko on väike poiss ja ta on kuus aastat vana. Väike tüdruk on ta õe, ta on viis aastat vana. Jaakkol on väike koer, ka koer on nüüd aias. Koerast on meeldivat mängida nende kahe lapsega. Koer on väga õnnelik nüüd.
4        kas On ka Maril koer? Ei, Maril ei ole/pole koera, tal on kass. Aga kass on majas, kass on magama.
5        Nende emad on sees majas kassiga, ta vaatab aknast välja ja näeb Jaakko ja Mari mängima. Jaakko jookseb nobedalt suure, vana puu juurde, ta on minekus peitu Marilt. Kas tead milleks? Mari istub ja tal on käed ta silmade/silme ees. Ta ei näe midagi ja ta arvutab. Milleks ta teeb nii? Aga mida Jaakko teeb puu lähedal?
6        See on mängis. Kui Mari on lõpetanud arvutamise, ta vaatab ringi. Ta otsib Jaakkot: Kuhu ta läks? Kas nägesid/nägid sellena ta?
7        Mari ei tea kus Jaakko on. Ta küsib koeralt: "Kas nägesid/nägid sellena Jaakkot?" Aga koer ei ilmselt osa rääkida! #Need  Mari ei saa vastust ta küsimusesse/küsimusse. Inimesed ei saa kunagi vastust, kui nad räägivad koeradele/koertele!
8        Mari vaatab nende ema akna taga, ta ema naerab. Mari mõtleb, et ta näges/nägi kuhu Jaakko läks: "Ütle mulle kus ta on!" Ta ütleb ta emale. "Ei Mari, ei tohi öelda/ütelda sulle!" Ta kostab. Kuigi ta tõenäoliselt teab kus poiss on, ta ei taha öelda/ütelda.
9        Mari jalutab aeglaselt aia läbi. Ta üritab ikka otsida Jaakkot. Ta vaatab laua alla ja toolide alla, aga Jaakko ei ole/pole seal. Ta vaatab kõikjale, aga ei või leidma Jaakkot.
10        Siis ta kuuleb hääle, see saab suure, vana puu taga. Su Võid kas see olla Jaakko? Seal see hääl on jälle! Ta kuulatab hoolikalt. See ei ole/pole lind või muu loom. Nüüd ta #võie kuul selle hästi. Selle peab olla Jaakko!
11        Siis ta näeb ka väikese/väikse käe ja kui ta jalutab lähemale, ta näeb ka poisi pea! Mari naerab ja ütleb: "Leidsin su!" Nad mõlemad on õnnelikke ja lähevad/lähvad majja, on aeg süüa midagi ja juua napi vett!

Todo before the next meeting

  • FST (Jaak)
    • The e-mails from Heli and Tiina --> make corrections in the FST.
    • Filter out the non-preferred parallel forms using the tag +Use/NG (exactly one of the parallel forms should be without +Use/NG tag and that form should be included in the pedagogical FST.)
  • Estonian Oahpa (Heli)
    • Comments on Vasta-S by Tiina and Kadri --> improvements.
    • Write some more question-answer templates for Morfa-C and Vasta-S and implement them in Oahpa.
  • MT fin-est (Heiki)
    • Write a rule for choosing initial uppercase/lowercase letter.
    • Maybe also some reordering rules or a rule for choosing between pole/ei ole .
  • Võru FST and Võro Oahpa
    • Sort words ending by -nõ to adjectives and substantives, add more words to semantic classes for Oahpa (Sulev)
    • Comparatives and more in FST (Jack)
    • User interface for Adjectives in Morfa-S. (Heli)
  • Gold standard corpus - work in progress (Kadri and students)

Next meeting

Tue 1.12. 09.00 Swedish time, 10.00 Estonian time