160105
Samest meeting Jan 5 2016
Participants: Heiki-Jaan, Heli, Jaak, Tiina, Trond
Last meeting: /lang/est/meetings/151215.html
Agenda
- FST
- Oahpa
- MT
- Subgroups
- other business: bach thesis on saami-est MT, samest report due March 1
FST
The hfst-problematic priority union was fixed in Hki. This gives correct analyses for the problematic nägema, etc. This issue worked for xfst earlier, but xfst then has some other problems.
Everything in the overriding exceptions file is probably fixed now.
Next step: Jaak to fix the issues in Heli's letter.
- Vokaalmitmus (vocal plural?) -- there seems to be overgeneration (when does the word have it?)
- Short illative -- HJK has told that he has classes where it is possible and where it is preferred? Oahpa would like to have them distinctively tagged when both are possible (or even if the pupil uses form that is obtained "regularly" but which actually is not used:
- poesse, )
- Lexicalized compounds -- some twolc rules assume # being there, should we add it? (töökojas)
Oahpa
Recently, Heli has worked on Morfa-C for both vro and est. The new solution for avoiding repeating exercises should be implememented for other lgs, but the solution is still not general enough. Heli works on this.
Heli will regenerate the lexicon (after the priority-union fix) and have a look.
MT
Heiki-Jaan has used the old hfst compiler, the one without the priority union problem.
echo 'ole' | hfst-lookup tools/mt/apertium/analyser-mt-apertium-desc.est.hfstol > ole olla<vblex><actv><imp><p2><sg> 0,000000 ole olla<vblex> <imp><prs><conneg><p2><sg> 0,000000 ole olla<vblex> <ind><prs><conneg> 0,000000 try 'ollut' and 'olleet' - don't have conneg reading for 'ei ollut' "olla" V Act Ind Prt ConNeg Sg - is missing "olla" V Act Ind Prt ConNeg Pl - is missing
incubator/apertium-fin-est/gt2apertium.cg3r
The ConNeg analysis is in place, both for the fin and the est CG:
"<Hän>" "hän" Pron Pers Sg Nom "<ei>" "ei" V Neg Act Sg3 "<tullut>" "tulla" V Act Ind Prt ConNeg Sg "<.>" "." Punct "<tema>" "tema" Pron Emph Sg Nom "<ei>" "ei" V Neg "<tulnud>" "tulema" V Pers Prt Ind Neg <== echo 'Hän ei tullut.' | apertium -d . fin-est Ta ei saanud. echo 'Hän ei ostanut.' | apertium -d . fin-est Ta ei ostnud. echo 'Hän ei koskaan ostanut.' | apertium -d . fin-est Ta ei ostnud kunagi.
Task: Add the V Act Ind Prt ConNeg Sg reading to olla.
<e><p><l>tulla<s n="vblex"/></l><r>tulema<s n="vblex"/></r></p></e> <e><p><l>tulla<s n="vblex"/></l><r>saama<s n="vblex"/></r></p></e> <e><p><l>tulla<s n="vblex"/></l><r>sattuma<s n="vblex"/></r></p></e> tf-hsl-m0016:~ ttr000$ ufin ole ole olla+V+Act+Imprt+Sg2 ole olla+V+Act+Imprt+Prs+ConNeg+Sg2 ole olla+V+Act+Ind+Prs+ConNeg ollut ollut olla+V+Pss+PrfPrc+Pl+Nom ollut olla+V+Act+PrfPrc+Sg+Nom ollut olla+V+Pss+PrfPrc+Pl+Nom ollut olla+V+Act+Ind+Prt+ConNeg+Sg <=== add this olleet olla+V+Act+Ind+Prt+ConNeg+Pl <=== add this tullut tullut tulla+V+Pss+PrfPrc+Pl+Nom tullut tulla+V+Act+PrfPrc+Sg+Nom tullut tulla+V+Act+Ind+Prt+ConNeg+Sg tullut tulla+V+Pss+PrfPrc+Pl+Nom tullut tulla+V+Act+PrfPrc+Sg+Nom tullut tulla+V+Act+Ind+Prt+ConNeg+Sg tf-hsl-m0016:~ ttr000$ usme leat leat leat+V+IV+Ind+Prs+Pl1 leat leat+V+IV+Ind+Prs+Pl3 leat leat+V+IV+Ind+Prs+Sg2 leat leat+V+IV+Ind+Prs+ConNeg leat leat+V+IV+Inf lean lean leat+V+IV+Ind+Prs+Sg1 lean leat+V+IV+Ind+Prt+ConNeg boahtán boahtán boahtit+V+IV+PrfPrc boahtán boahtit+V+IV+Ind+Prt+ConNeg ii boahtán = ei tullut
Summing up tags
- Conversion
- There is a wiki.apertium.org page for documenting tag conversion
- There are scripts both on the giella and the apertium side to govern the conversion
- Goal: Harmonising tags across languages when possible
- There is a wiki.apertium.org page for documenting tag conversion
- Linguistics:
- The finnish verb olla should be brought into line with the other verbs (ollut = tullut)
fin2X
sme2fin
Needed: Gold corpus. Forthcoming from a finnish Textbook.
- Qnt
- Qu
The documentation says:
+Qu !!= * {{@CODE@}}: Quantor +Qnt !!= * {{@CODE@}}: Quantor?????
- Find out why there are two tags
- If no good reason: Choose one of them
- If a good reason: Make transfer rules
- If no good reason: Choose one of them
Starting to make a kind of gold text from Finnish textbook. Some minor issues.
echo -e "ettei\neikä" | hfst-lookup -q -p $GTHOME/langs/fin/src/analyser-gt-desc.hfst | cut -f1-2 | cg-conv -f "<ettei>" "ettei" Pcle CS "ei" CS preferred, wrong form "että" missing POS "että" CSei Neg Act Sg3 "<eikä>" "ei" V Neg Act Sg3 Foc_kA "ei" V Neg Act Sg3 Foc/ka @synfuction <== preferred "eikä" CC <== problem with enkä "eikä" Pcle CC "ja" CCei Neg Act Sg3 wrong
- 46: Hän ei puhu viroa eikä venäjää. <-- eikä = ega = ja mitte
- 209: Olutkin on edullista eikä niin kallista kuin muualla. <-- eikä = ja/aga mitte
- 331: Sanna on juuri nyt vaikeassa iässä eikä halua puhua äidin kanssa. <-- eikä = ega
- olen norjalainen enkä virolainen <-- eikä = aga mitte
echo 'olen norjalainen enkä virolainen'
apertium -d . fin-est |
Olen *norjalainen ja ei eestlane
Foc_kA is plain wrong. It shall be Foc/ka
Cliticized forms:
etten etten etten+Pcle+CS etten että+CSei+Neg+Act+Sg1 etten että#en+CS ettet ettet ettet+Pcle+CS ettet että+CSei+Neg+Act+Sg2 ettet että#et+CS eikä eikä ei+V+Neg+Act+Sg3+Foc_kA eikä eikä+CC eikä eikä+Pcle+CC eikä ei+V+Neg+Act+Sg3+Foc/ka eikä ja+CCei+Neg+Act+Sg3
Double forms for translating into Finnish:
Added +Use/NG (omenien, omenoiden, omenoitten, omenain, ..)
There may be some double forms left.
echo 'pääteos'
hfst-lookup - have 45 analyses |
bach thesis on saami-est MT
This will be the sixth active MT pair for sme2X,
TODO
- Look at the tag conversion issue, technically, documentation
- Discuss in the MT group
Subgroups
Oahpa: Heli, Tiina, Kadri
- Trond to send timeslots for CG/MT
- Heli to send timeslots for Oahpa, FST
samest report due March 1
Unni Norum@uit.no + trond.trosterud@uit.no
Next meeting
Tuesday, Feb 9th, 0900 (Trond perhaps to ask for change of time)