160628
Samest meeting 28.06.2016
Participants: Heiki, Heli, Jaak, Sjur, Tiina, Trond
Agenda
- Estonian FST
- Fin2X MT
- Next meetings
Estonian FST
Heiki-Jaan has worked on the experimental-langs directory,
Jaak has looked at experimental-langs, so that it compiles for xfst
The message is: Something is moving and changing.
Fin2X MT
The fin analyser
- fin2X, x = est, sme.
- We made sme2fin before christmas:
- fin dis was allegedly so bad
- fin fst overgenerated
- fin fst contains multiple path issues.
- fin dis was allegedly so bad
The situation now is:
- fin fst overgenerates, but only to some degree (not much).
- fin dis:
- Development corpus of 2000 sentences, 15500 words. Result: 99.7%
- Test corpus fiwiki of 61000 words: ambiguity: 1.14 readings/word
- Development corpus of 2000 sentences, 15500 words. Result: 99.7%
- fin fst multiple tag paths: Trivial corrections
The positive conclusion:
- Things are not that bad
- We should remove banalities to face the real MT issues.
TODO (Tiina, Trond)
Finish the fin fst cleanup.
Fin2est
Bachelor student Keit Mõisavald will attend the summer school
Issues in sme2smX:
- sme: ipmirdat go dan (norw) / ipmirdatgo dan (finn)
- smj: norw only
- smn: finn only
sme2xmX has to deal with both "norw2finn" and "finn2norw"
The Saami MTs have worked quite a lot with clitics, and will be
Issues for MT:
- Scope of positive/negative -kin / -kAAn
- Clitic or adverbial (Pekkakin vs. myös Pekka)
Current "ambiguity":
4 "<tietenkin>" Pcle Pcle Foc/kin "tietenkin" Pcle "tieten" Pcle Foc/kin <== of course plain wrong
Do we want to lexicalise the common light words + clitics or not?
Next meetings
- MT: 30.6. 0900 Norwegian time
- Samest: August 4th at 14.00 EET, 13.00 CET.
Sjur will probably still be on vacation on August 4.