Samest meeting 28.06.2016

Participants: Heiki, Heli, Jaak, Sjur, Tiina, Trond


  • Estonian FST
  • Fin2X MT
  • Next meetings

Estonian FST

Heiki-Jaan has worked on the experimental-langs directory, converted filolex to giella format, and found some irregular issues calling for treatment, and has made some changes in the 2-level rules.

Jaak has looked at experimental-langs, so that it compiles for xfst also.

The message is: Something is moving and changing. We have two approaches to Estonian fst now, and try out both.

Fin2X MT

The fin analyser

  • fin2X, x = est, sme.
  • We made sme2fin before christmas:
    • fin dis was allegedly so bad
    • fin fst overgenerated
    • fin fst contains multiple path issues.

The situation now is:

  • fin fst overgenerates, but only to some degree (not much).
  • fin dis:
    • Development corpus of 2000 sentences, 15500 words. Result: 99.7%
    • Test corpus fiwiki of 61000 words: ambiguity: 1.14 readings/word
  • fin fst multiple tag paths: Trivial corrections

The positive conclusion:

  • Things are not that bad
  • We should remove banalities to face the real MT issues.

TODO (Tiina, Trond)

Finish the fin fst cleanup. Again, have a look at the CG output + tag unification. The next focus could then be on MT output.


Bachelor student Keit Mõisavald will attend the summer school in Alacante, her thesis will be on translating fin clitics (-ko, -han, etc) into Estonian.

Issues in sme2smX:

  • sme: ipmirdat go dan (norw) / ipmirdatgo dan (finn)
  • smj: norw only
  • smn: finn only

sme2xmX has to deal with both "norw2finn" and "finn2norw"

The Saami MTs have worked quite a lot with clitics, and will be relevant (as will eventual work on fin2sme or sme2fin).

Issues for MT:

  • Scope of positive/negative -kin / -kAAn
  • Clitic or adverbial (Pekkakin vs. myös Pekka)

Current "ambiguity":

"<tietenkin>" Pcle Pcle Foc/kin
        "tietenkin" Pcle
        "tieten" Pcle Foc/kin  <== of course plain wrong

Do we want to lexicalise the common light words + clitics or not? The answer that meets the demand of the long tail is the latter. Go through the presentations of 30 years of Virsu seminars?

Next meetings

  • MT: 30.6. 0900 Norwegian time
  • Samest: August 4th at 14.00 EET, 13.00 CET.

Sjur will probably still be on vacation on August 4.