140408
Contents:
Meeting on smenob transition
April 8th, 2014
Fran, Lene, Sjur, Trond
LEXICAL SELECTION
ny fil:
apertium-sme-nob.sme-nob.lex
gammel fil:
dev/archive/apertium-sme-nob.sme-nob.lex
Now we use SELECT/REMOVE rules instead of SUBSTITUTE rules.
The input looks like:
$ echo "biebmu" | apertium -d . sme-nob-biltrans ^biebmu<n><sg><nom><@HNOUN>/føde<n><m><unc><sg><nom><@HNOUN>/mat<n><m><unc><sg><nom><@HNOUN>$^.<clb>/.<sent><clb>$ And the output: $ echo "biebmu" | apertium -d . sme-nob-lex ^biebmu<n><sg><nom><@HNOUN>/mat<n><m><unc><sg><nom><@HNOUN>$^.<clb>/.<sent><clb>$
It is much easier to debug this way. Although it means rewriting the old rules, and specifically writing default rules.
SELECT ("mat"i) IF (0 ("<biebmu>"i)) ;
the source language lemma ( "biebmu") is the WORDFORM of the biltrans output.
word form = biebmu<n><sg><nom><@HNOUN> (tags are invisible)
readings:
- "føde" n m unc sg nom @HNOUN
- "mat" n m unc sg nom @HNOUN
$ echo "biebmu" | apertium -d . sme-nob-biltrans ^biebmu<n><sg><nom><@HNOUN>/føde<n><m><unc><sg><nom><@HNOUN>/mat<n><m><unc><sg><nom><@HNOUN>$^.<clb>/.<sent><clb>$ Output of biltrans in CG-style output: "<biebmu>" n sg nom @HNOUN "føde" n m unc sg nom @HNOUN "mat" n m unc sg nom @HNOUN "<.>" clb "." sent clb
Lene vil oppdatere lex-fila frå dev/archive til gjeldande fil.
REGRESSIONS
sme-nob boradišgohten. - jeg begynte å spise. + #spise. $ echo "boradišgohten" | apertium -d . sme-nob-biltrans ^boradit<v><tv><der_goahti><ind><prt><sg1><@+FMAINV>/spise<vblex><pers><der_goahti><ind><prt><sg1><@+FMAINV>$^.<clb>/.<sent><clb>$ Original: ^boradit<v><tv>+goahti<v><ind><prt><sg1><@+FMAINV>$
So, how do we change der_goahti to +goahti ?
boradišgohten boradišgohten boradit+V+TV+Der/goahti+V+Ind+Prt+Sg1 borakeahttá borakeahttá borrat+V+TV+VAbess $ echo "boradišgohten" | apertium -d . sme-nob-morph ^boradišgohten/boradit<v><tv><der_goahti><v><ind><prt><sg1>/borrat<v><tv><der_d><v><der_goahti><v><ind><prt><sg1>$^./.<clb>$ $ echo "^boradit<v><tv>+goahti<v><ind><prt><sg1><@+FMAINV>$" | apertium-pretransfer | lt-proc -b sme-nob.autobil.bin ^boradit<v><tv>/spise<vblex><pers>$ ^goahti<v><ind><prt><sg1><@+FMAINV>/begynne<vblex><pret><sg><p1><@+FMAINV>$ $ echo "^boradit<v><tv>+goahti<v><ind><prt><sg1><@+FMAINV>$" | apertium-pretransfer | lt-proc -b sme-nob.autobil.bin | apertium-transfer -b apertium-sme-nob.sme-nob.t1x sme-nob.t1x.bin ^verb<SV><@+FMAINV><ind><pret><p1><sg><pers>{^begynne<vblex><pret>$}$ ^part<part>{^å<part>$}$ ^verb<SV><inf><pers><NC>{^spise<vblex><inf>$}$ -> begynte å spise
Focus words
+Foc/ge +ge<pcle> +Foc/gen +gen<pcle> +Foc/ges +ges<pcle> +Foc/gis +gis<pcle> +Foc/naj +naj<pcle> +Foc/ba +ba<pcle> +Foc/be +be<pcle> +Foc/hal +hal<pcle> +Foc/han +han<pcle> +Foc/bat +bat<pcle> +Foc/son +son<pcle> +Foc/naj+Qst <qst>+naj<pcle> +Qst+Foc/son <qst>+son<pcle>
The + makes them into separate words (for the lexical transfer?)
- translate tags to plus notation
- disambiguate
- split words
- send to biltrans and get nob words for the + "words"
$ echo "^boađátge/boahtit<v><iv><ind><prs><sg2>+ge<pcle>$" | cg-conv -a -l "<boađátge>" "boahtit" v iv ind prs sg2 "ge" pcle <spectre> Unhammer, did you make any changes to the CG in sme-nob in order to deal with subreadings ? <Unhammer> can't remember …
From SMA:
LEXICON FINAL1 ENDLEX ; +Foc/ge+Use/Circ:#ge ENDLEX ; ! +Foc/gan+Use/Circ:#gan ENDLEX ; ! +Foc/gih+Use/Circ:#gih ENDLEX ; ! +Foc/gænnah+Use/Circ:#gænnah ENDLEX ; !
Todo
Sjur@Apertium:
- fjern derivasjons-strengar m. språkparspesifikke tilpassingar
- endra visse taggar for visse språkpar
- pkgconfig-skript for GTD-språka
Legge til ord frå nobsme/inc/False* til smenob
- Dele smenob-katalogen i <e src vs. <e>
- Ha <e> i separat katalog loansrc
- Legge til alle smenob/src til bidix
- Legge atil alle smenob/loansrc til bidix, men merka r="LR" i sme-nob.dix
Propernouns
- Legge til sme-nob (og sme-fin og sme-swe) frå geo/xml_src
- smi-propernouns.lexc
- commons smi