130828
Apertium sme-sma 28.8.
Present: Francis, Lene, Trond.
Agenda:
- Tag reording
- Other issues
Tag reordering
bidix fst result --------------------------------------------------------------- skuvla<n> .INTERSECT. skuvla<sem_org><sem_plc><n> = () skuvla<n> .INTERSECT. skuvla<n><sem_org><sem_plc> = (skuvla<n>) skuvla<n> .INTERSECT. skuvla<n> = (skuvla<n>) ---------------------------------------------------------------
But:
<e><p><l>Stockholm<s n="n"/><s n="prop"/><s n="plc"/></l><r>Stuehkie<s n="n"/><s n="prop"/></r></p></e>
solutions:
- move tags
- delete tags
So:
We do the following
- Reshuffle tags in fst:
- prop tags are as they are from fst
- noun are reordered by a script in sme/src/filters
- adverbs are as they are from fst
- prop tags are as they are from fst
- the tags are then matched for the pruning of the fst in apertium
- The bidix does not contain semtags
- The cg in Apertium removes the tags
here is the explicit description of what we want:
--------------------------------------------------------------- fst: <n><sem_org><sem_plc><sg>... after reshuffle <n><prop><sem_plc><sg>... from the fst <adv><sem_time> cg: semtags in, nosemtags out bidix: no sem-tags ---------------------------------------------------------------
Other issues
jagi 1974
Son lea riegádan "jagi 1974" ja bajásšaddan "Romssas". Dihte "jaepien 1974" reakedi jih "Tromsene" byjjesovvi. ^Son/Son<pron><pers><sg3><nom><@SUBJ→>$ ^lea/leat<v><iv><ind><prs><sg3><@+FAUXV>$ ^riegádan/riegádit<v><iv><prfprc><@-FMAINV>$ ^jagi/jahki<n><sg><gen><@←ADVL>$ ^1974/1974<num><sg><nom><@N←>$ ^ja/ja<cc><@CNP>$ ^bajásšaddan/bajásšaddat<v><iv><prfprc><@-FMAINV>$ ^Romssas/Romsa<n><prop><sg><loc><@←ADVL>$^../..<clb>$ $ echo "Son lea riegádan jagi 1974 ja bajásšaddan Romssas." | apertium -d . sme-sma Dïhte jaepien 1974 reakadamme jïh Tromsøesne byjjenamme. ^pron-pers<SN><@SUBJ→><PX>{^Dïhte<pron><pers><sg3><nom>$}$ ^v<SV><@+FMAINV>{^reakadidh<v><iv><prfprc>$}$ ^n-num<SN><@←ADVL><PX>{^jaepie<n><sg><gen>$ ^1974<num><sg><nom>$}$ ^cc<CC><@CNP>{^jïh<cc>$}$ ^prfprc<SV><@-FMAINV>{^byjjenidh<v><iv><prfprc>$}$ ^n<SN><@←ADVL><PX>{^Tromsø<n><prop><sg><ine>$}$^sent<SENT><@X>{^..<clb>$}$
"<Son>" "son" Pron Pers Sg3 Nom @SUBJ> "<lea>" "leat" V IV Ind Prs Sg3 @+FAUXV "<riegádan>" "riegádit" V IV PrfPrc @-FMAINV "<jagi>" "jahki" N Sg Gen @<ADVL "<1974>" "1974" Num Sg Nom @N< "<ja>" "ja" CC @CNP <==================== @CVP would tell that there is coming a new finite verb "<bajásšaddan>" "bajásšaddat" V IV PrfPrc @-FMAINV "<Romssas>" "Romsa" N Prop Sg Loc @<ADVL "<.>" "." CLB
Links:
-
http://wiki.apertium.org/wiki/North_Saami_-_South_Saami_syntactic_issues
-
http://wiki.apertium.org/wiki/North_Saami_-_South_Saami_morphological_issues
- http://wiki.apertium.org/wiki/North_Saami_-_South_Saami_bilingual_lexicon
Before: $ echo "Nieiddat leat čeahpit. " | apertium -d . sme-sma Nïejth leah væjkelh. After: $ echo "Nieiddat leat čeahpit. " | apertium -d . sme-sma Nïejth leah væjkele.
^buot/buot<pron><indef><attr><@→N>$ ^gielaid/giella<n><pl><acc><@←OBJ>$ ^attr-n<SN><@←OBJ><PX>{^gaajhke<pron><indef><pl><acc>$ ^gïele<n><pl><acc>$}$ gaajhkide gïelide
echo "Mun boađán boahtte jagi." | apertium -d . sme-sma Manne båetije jaepien båatam. båetije båetedh+V+IV+PrsPrc båetije båetedh+V+IV+Der/NomAg+N+Sg+Nom
Here is a solution
<e><p><l>boahtte<s n="a"/></l><r>båetije<s n="a"/></r></p></e>
- ND = number to be determined (take it from the noun)
- CD = case to be determined (take it from the noun)
Lexicon entry form:
<e r="LR"><p><l>boahtte<s n="a"/><s n="attr"/></l><r>båetije<s n="a"/><s n="ND"/><s n="CD"/></r></p></e> <e r="RL"><p><l>boahtte<s n="a"/><s n="attr"/></l><r>båetije<s n="a"/></r></p></e>
Now:
^pron-pers<SN><@SUBJ→><PX>{^Manne<pron><pers><sg1><nom>$}$ ^v<SV><@+FMAINV>{^båetedh<v><iv><ind><prs><sg1>$}$ ^a-n<SN><@←ADVL><PX>{^båetije<a><sg><gen>$ ^jaepie<n><sg><gen>$}$^sent<SENT><@X>{^..<clb>$}$ $ echo "Mun boađán boahtte jagi." | apertium -d . sme-sma Manne #båetije jaepien båatam.
So, what is the role of båetijen?
boahtte jagis -> båetijen jaepesne attr loc gen.attr ine båetije båetije+A+Attr <=== remove Attr and give cases båetijen båetije+A+Gen+Attr båetije båetije+A+Sg+Nom båetijen båetije+A+Sg+Acc Use/MT båetijen båetije+A+Sg+Gen båetijen båetije+A+Sg+Ine båetijen båetije+A+Sg+Ela båetijen båetije+A+Sg+... sme: buori buorre+A+Sg+Gen buori buorre+A+Sg+Acc buorre buorre buorre+A+Sg+Nom boahtte boahtte+A+Attr => båetije/båetijen Pron+Dem @→N = +Det+Dem (?) Pron+Dem +Attr @→N = +Det+Dem (?)
We then have four types:
indecl sme -> decl sma
<e r="LR"><p><l>boahtte<s n="a"/><s n="attr"/></l><r>båetije<s n="a"/><s n="ND"/><s n="CD"/></r></p></e> <e r="RL"><p><l>boahtte<s n="a"/><s n="attr"/></l><r>båetije<s n="a"/></r></p></e>
indecl sme -> indecl sma
-
decl sme -> decl sma
-
decl sme -> indecl sma
<e r="LR"><p><l>dehálaš<s n="a"/></l><r>vihkeles<s n="a"/><s n="indecl"/></r></p></e> <e r="RL"><p><l>dehálaš<s n="a"/></l><r>vihkeles<s n="a"/></r></p></e>
- vihkeles+A+Superl+Sg+Nom vihkelommes
- vihkeles+A+Sg+Acc vihkelem
He sees the X cat.
in langs/sma: ./configure --with-hfst --enable-apertium --enable-oahpa the file is sma/src/morphology/*/adjectives-oahpa.lexc
Lexical selection
boahtit oidnosii -> våajnoes sjïdtedh
Default translation for boahtit is båetedh, alternative translation is sjïdtedh
<e slr="1"><p><l>boahtit<s n="v"/><s n="iv"/></l><r>båetedh<s n="v"/><s n="iv"/></r></p></e> <e slr="2"><p><l>boahtit<s n="v"/><s n="iv"/></l><r>sjïdtedh<s n="v"/><s n="iv"/></r></p></e>
apertium-sme-sma.sme-sma.lrx
<rule> <match lemma="boahtit" tags="v.*"><select lemma="båetedh" tags="v.*"/></match> </rule> <rule> <match lemma="boahtit" tags="v.*"><select lemma="sjïdtedh" tags="v.*"/></match> <match lemma="oidnosii"/> </rule>
^boahtit<v><iv><ind><prs><pl3><@+FMAINV>/båetedh<v><iv><ind><prs><pl3><@+FMAINV>/sjïdtedh<v><iv><ind><prs><pl3><@+FMAINV>$ ^oidnosii<adv><@←ADVL>/våajnoes<adv><@←ADVL>$
^Dat/dat<pron><dem><pl><nom>/dat<pron><dem><sg><nom>$ ^boahtá/boahtit<v><iv><ind><prs><sg3>$ ^oidnosii/oidnosii<adv>$^./.<clb>$
- Artihkele 7: m *geatnegahttá nasjonaalestaatide tjïrrehtidh konkreetide råajvarimmide vaarjeleminie åvteste unnebelåhkoengïeli nimhtie ahte dah våajnoes sjidtieh dovne politihkesne, laakine jïh åtnosne.
^pron-dem<SN><@SUBJ→><PX>{^dïhte<pron><dem><pl><nom>$}$ ^v<SV><@+FMAINV>{^sjïdtedh<v><iv><ind><prs><pl3>$}$ ^adv<ADV><@←ADVL>{^våajnoes<adv>$}$
<ADV><@←ADVL>
$ bash dev/test-prefix.sh "juoga<pron><indef>"
error test report
sh generation-errors.sh | grep '#' | grep '<pron' 10 ^jïjnje<pron><indef><pl><pl><nom>$ #jïjnje\<pron\>\<indef\>\<pl\>\<pl\>\<nom\> ^jïjnje/jïjnje<a><attr>/jïjnje<a><sg><nom>/jïjnje<adv>/jïjnje<pron><indef><sg><nom>$^./.<clb>$ 2 ^måedtie<pron><indef><sg><ine>$ #måedtie\<pron\>\<indef\>\<sg\>\<ine\> ^måedtie/måedtie<pron><indef><sg><nom>$^./.<clb>$ 2 ^aktege<pron><indef>$ #aktege\<pron\>\<indef\> ^aktege/akte<num><sg><nom><foc_ge>/aktege<pron><indef><attr>$^./.<clb>$ 1 ^naan<pron><indef><sg><nom>$ #naan\<pron\>\<indef\>\<sg\>\<nom\> ^naan/naan<pron><indef>$^./.<clb>$ 1 ^naan<pron><indef><pl><acc>$ #naan\<pron\>\<indef\>\<pl\>\<acc\> ^naan/naan<pron><indef>$^./.<clb>$ 1 ^dagkeres<pron><dem><pl><com><attr>$ #dagkeres\<pron\>\<dem\>\<pl\>\<com\>\<attr\> ^dagkeres/dagkeres<a><attr>/dagkeres<pron><dem><attr>$^./.<clb>$
SVO - SOV with adjectives
SVO to SOV, but also S V AN to S AN V
- Son oaidná skuvlla. Son oaidná ođđa skuvlla.
- Before the fix: Satne skuvlem vuajna. Satne vuajna orre skuvlem.
- After the fix: Son oaidná ođđa skuvlla. Dïhte orre skuvlem vuajna.
- Before the fix: Satne skuvlem vuajna. Satne vuajna orre skuvlem.
^pron-pers<SN><@SUBJ→><PX>{^Dïhte<pron><pers><sg3><nom>$}$ <rule comment="REGLA: pron-pers"> ^v<SV><@+FMAINV>{^vuejnedh<v><tv><ind><prs><sg3>$}$ <rule comment="REGLA: verb-fin"> ^a-n<SN><@←OBJ><PX>{^orre<a><attr>$ ^skuvle<n><sg><acc>$}$ <rule comment="REGLA: adj-rdep n"> ^sent<SENT><@X>{^..<clb>$}$
The trick was to define a phrase (SN) in the t1x file, and then refer to that phrase later.