161116
MT-Apertium
Lene, Fran, Trond
Saker
- Lage motsatt versjoner av språkparene
- Bug 2201
- Harmonisering av kategoriar i t1x-filene
Motsatt versjoner
- sma->sme, smj->sme, smn->sme fungerer
$ echo "Månnoeh aehtjieh gåetide båetiejimen." | apertium -d . sma-sme Moai áhčit dáluide bođiime. $ echo "Sámij årromguovllo gåhtjuduvvá Sábmen." | apertium -d . smj-sme Sámiiguin ássanguovlu #gohčodit Sápmin.
Bug 2201
Francis legger inn i dix-filene definisjon av numeral som adjektiv som dekker tallene 1-39
<e><re>([0-9]|[0-3][0-9]+)</re><p><l><s n="adj"/><s n="ord"/></l><r><s n="adj"/><s n="ord"/></r></p></e>
Vi ser på muligheta av å disambiguere sentence delimiter i before-section, evt. ta opp med Tino hvis behov.
Harmonisering av kategoriar .t1x-filene
Oversyn over arbeid:
- Harmonisere navn på dem som har likt innhold - OK (i -sma, -smj, -smn)
- Legge til språkpar 3 de som er i språkpar 1 og 2 - OK
- De som er i bare ett språkpar, legges i en egen blokk
- Fjern dei som ikkje er i bruk
- Harmonisere: bruk av understrek istedenfor bindestrek i navn
- Endre navnene slik at de dekker innholdet
- Vi vil ha nordsamisk som metaspråk
- Navn for kategorier for funskjoner med pil, H = head og D = dependent: obj_H, H_obj, D_po
Tiltak
- Lene ser på namna i gruppene og sender forslag via epost
Desse er i alle tre filer, med same innhald:
x 3 <def-cat n="váikkuhitverb"> váikkuhit_vblex x 3 <def-cat n="sent"> x 3 <def-cat n="semyear"> x 3 <def-cat n="prosent"> x 3 <def-cat n="prn-rel"> x 3 <def-cat n="prn-res"> x 3 <def-cat n="prn-pers"> x 3 <def-cat n="prn-dem"> x 3 <def-cat n="prn"> x 3 <def-cat n="pr"> x 3 <def-cat n="post"> x 3 <def-cat n="mielde-adv"> x 3 <def-cat n="miehtá"> miehtá_adp x 3 <def-cat n="liikotverb"> liikot_v x 3 <def-cat n="jagis"> jahki_loc x 3 <def-cat n="jagi"> jahki_gen x 3 <def-cat n="geahččatverb"> geahččat_vblex x 3 <def-cat n="ge"> x 3 <def-cat n="foc"> x 3 <def-cat n="dihte"> dihte_post x 3 <def-cat n="default"> x 3 <def-cat n="comma"> x 3 <def-cat n="cnjsub"> x 3 <def-cat n="cnjcoo"> x 3 <def-cat n="buorre"> buorre_adj x 3 <def-cat n="boaris"> boaris_adj x 3 <def-cat n="adpos"> x 3 <def-cat n="adj"> x 3 <def-cat n="váste"> váste_post x 3 <def-cat n="prn-refl"> x 3 <def-cat n="pcle"> x 3 <def-cat n="birra-po"> birra_post x 3 <def-cat n="birra-adv"> birra_Adv x 3 <def-cat n="mánnu"> months x 3 <def-cat n="rdep"> x 3 <def-cat n="prn-itg"> x 3 <def-cat n="prn-ind"> x 3 <def-cat n="guhkki"> guhkki_Adj x 3 <def-cat n="prop-nom"> x 3 <def-cat n="prop-attr"> x 3 <def-cat n="áigi"> x 3 <def-cat n="prn-attr"> x 3 <def-cat n="geatnegahtton"> geatnegahtton_adj x 3 <def-cat n="verb-prfprc"> x 3 <def-cat n="mod_num"> x 3 <def-cat n="vuollai"> vuollai_post x 3 <def-cat n="leat-aux"> x 3 <def-cat n="vp-boundary"> x 3 <def-cat n="numcmp"> x 3 <def-cat n="prn-pers-gen"> x 3 <def-cat n="prn-pers-acc"> x 3 <def-cat n="prn-dem-sg"> x 3 <def-cat n="cmp_splitr"> x 3 <def-cat n="num-ldep"> x 3 <def-cat n="gen"> x 3 <def-cat n="boahtit"> boahtit_vblex x 3 <def-cat n="cnp"> x 3 <def-cat n="dat-pcle"> x 3 <def-cat n="bealis-px"> x 3 <def-cat n="fápmui"> fápmu_n_ill x 3 <def-cat n="fárrui"> fárrui_post x 3 <def-cat n="abbr"> x 3 <def-cat n="go-qst"> x 3 <def-cat n="go-cnjsub"> x 3 <def-cat n="beassatverb"> beassat_vblex x 3 <def-cat n="geahčen"> geahčen_post x 3 <def-cat n="sisa"> sisa_post x 3 <def-cat n="mielde-post"> x 3 <def-cat n="bokte"> bokte_post x 3 <def-cat n="beallai"> beallai_post x 3 <def-cat n="hupmatverb"> hupmat_verbs
Desse er i alle tre filene, men med ulikt innhald
3 <def-cat n="adv"> | different tags - harmoniser
3 <def-cat n="ja"> | different lemmas - max
3 <def-cat n="noun"> | extra tag - max
3 <def-cat n="word"> | different tags - max
3 <def-cat n="numeral"> | different tags-
endre navnet for settet i sma til: "numeral_not_year" ,
legg til "numeral" som samme cat i sma som i de andre to
sma:
<def-cat n="numeral">
<cat-item tags="num.*"/>
</def-cat>
<def-cat n="numeral-not-year">
<cat-item tags="num.rom.*"/>
<cat-item tags="num.arab.*"/>
<cat-item tags="num.sg.*"/>
<cat-item tags="num.pl.*"/>
<cat-item tags="num.ess.*"/>
</def-cat>
smn:
<def-cat n="numeral">
<cat-item tags="num.*"/>
</def-cat>
smj:
<def-cat n="numeral">
<cat-item tags="num.*"/>
</def-cat>
3 <def-cat n="verb-fin"> | different tags
=> "verb-fin-not-imp" alle tre språk, for smj: "verb-cond"
inkluderes inn i verb-fin-regelen, med egen chunking
3 <def-cat n="negverb"> | wildly different tags
sma = smn mønsteret kopieres til smj, men med nytt navn: negverb
Vi arbeider videre med disse over mail
3 <def-cat n="prop-cmp"> | different tags
3 <def-cat n="n-cmp"> | different tags
3 <def-cat n="num-nomacc"> | different tags
3 <def-cat n="nom-gen"> | different tags
3 <def-cat n="attr"> | different tags
3 <def-cat n="leat-main-fin"> | different tags
3 <def-cat n="n-sg-nom"> | different tags
3 <def-cat n="n-not-cmp"> | different tags
3 <def-cat n="verb-inf"> | different tags
Desse er i berre ei fil:
Leksikaliserte - Lene foreslår andre navn - epost
1 <def-cat n="Laarain"> pron_D_com
1 <def-cat n="aahka"> ahkku_addja
1 <def-cat n="aehtjie"> áhčči_eadni
1 <def-cat n="ahte"> ahte_cnjsub
1 <def-cat n="ala"> ala_post
1 <def-cat n="almmá"> almmá_adv
1 <def-cat n="atnit"> atnit_vblex
1 <def-cat n="atnu"> atnu_n_ill
1 <def-cat n="beaivi"> beaivi_n
1 <def-cat n="buot"> buot_prn
1 <def-cat n="coggat"> coggat_vblex
1 <def-cat n="dat"> dat_prn
1 <def-cat n="dattetge"> dattetge_maiddai_adv
1 <def-cat n="dieđu"> diehtu_n_acc
1 <def-cat n="dihto-adj"> dihto_adj
1 <def-cat n="duohken"> duohken_post
1 <def-cat n="dálá"> dálá_adj
1 <def-cat n="dárbu"> dárbu_n
1 <def-cat n="eanet"> adj-adj_comp
1 <def-cat n="eará"> eará_seammá_prn
1 <def-cat n="eatnat"> eatnat_adv
1 <def-cat n="eret"> eret_adv
1 <def-cat n="giittos"> giittos_n
1 <def-cat n="goiku"> thirst_hunger_n
1 <def-cat n="gosage">
1 <def-cat n="guokte"> guokte_num
1 <def-cat n="haga"> haga_post
1 <def-cat n="ii"> neg_indic
1 <def-cat n="ii-imp"> neg_imp
1 <def-cat n="ipmárdus"> understanding_n
1 <def-cat n="jahki"> jahki_n
1 <def-cat n="juohke"> juohke_prn
1 <def-cat n="lassin"> lassi_n_ess
1 <def-cat n="ládje"> ladje_post
1 <def-cat n="láhki"> láhki_n
1 <def-cat n="maake"> uncle_aunt_n
1 <def-cat n="maiddái"> maiddái_adv
1 <def-cat n="mearkkašit"> mearkkašit_vblex
1 <def-cat n="mihkkege"> ind_prn_neg
1 <def-cat n="moai"> prn_pers_du
1 <def-cat n="nu"> nu_adv
1 <def-cat n="nubbi"> nubbi_prn
1 <def-cat n="oaivvildit"> oaivvildit_vblex
1 <def-cat n="oassálastit"> oassálastit_vblex
1 <def-cat n="oažžut"> oažžut_vbelx
1 <def-cat n="oktavuohta"> oktavuohta_n
1 <def-cat n="olu"> olu_adv
1 <def-cat n="ovdamearka"> ovdamearka_n
1 <def-cat n="ovttasráđiid"> ovttasráđiid_adv
1 <def-cat n="ovttâst-verb"> ovttasbargat_vblex
1 <def-cat n="uđđâsist-verb"> ođđasisorganiseret_vblex
1 <def-cat n="vejolaš"> vejolaš_adj
1 <def-cat n="vuhtii"> vuhtii_adv
1 <def-cat n="vuástá-verb"> vuostáiváldit_vblex
1 <def-cat n="váldit"> váldit_vblex
1 <def-cat n="árvvus"> árvu_n_loc
1 <def-cat n="älkkeebin-verb"> álkidahttit_vblex
1 <def-cat n="čájehit"> čájehit_vblex
1 <def-cat n="shoes"> shoe_n
Grammatiske
1 <def-cat n="a-adv-comp">
1 <def-cat n="actio-sem_act">
1 <def-cat n="adj-rdep">
1 <def-cat n="adj-sup">
1 <def-cat n="adj2verb">
1 <def-cat n="adjattr-all">
1 <def-cat n="adjattr-pos">
1 <def-cat n="adv-empty">
1 <def-cat n="adv_go">
1 <def-cat n="adv_r"> => "l_advl"
1 <def-cat n="agreem-pl">
1 <def-cat n="agreem-pl-gen">
1 <def-cat n="agreem-sg">
1 <def-cat n="agreem-sg-gen">
1 <def-cat n="agreement-attr">
1 <def-cat n="agreement-attr">
1 <def-cat n="agreement-buot">
1 <def-cat n="agreement-half">
1 <def-cat n="anon-verb">
1 <def-cat n="auxverb">
1 <def-cat n="com-ess-verb">
1 <def-cat n="comitative">
1 <def-cat n="conneg">
1 <def-cat n="connegverb">
1 <def-cat n="dem-gen">
1 <def-cat n="gencompl">
1 <def-cat n="genmod">
1 <def-cat n="habitive">
1 <def-cat n="illadvl">
1 <def-cat n="inessive">
1 <def-cat n="interj">
1 <def-cat n="l_spred">
1 <def-cat n="l_subj">
1 <def-cat n="leat-aux-main">
1 <def-cat n="leat-conneg">
1 <def-cat n="leat-drop">
1 <def-cat n="leat-main">
1 <def-cat n="leat-main-infin">
1 <def-cat n="leat-main-pret">
1 <def-cat n="leat-qst">
1 <def-cat n="leat_conneg_prt">
1 <def-cat n="leat_prfprc">
1 <def-cat n="mainverb">
1 <def-cat n="mainverbqst">
1 <def-cat n="n-pl-com">
1 <def-cat n="n-prop-pers">
1 <def-cat n="n-sg-accgen">
1 <def-cat n="n-sg-com">
1 <def-cat n="n-sg-gen">
1 <def-cat n="n-sg-pl-gen">
1 <def-cat n="negverb23">
1 <def-cat n="negverbqst">
1 <def-cat n="nomact-com">
1 <def-cat n="nomact-ill">
1 <def-cat n="nomact-ill-modn">
1 <def-cat n="nomact-nom">
1 <def-cat n="nominal">
1 <def-cat n="not-pcle">
1 <def-cat n="noun-ill">
1 <def-cat n="noun-loc">
1 <def-cat n="noun-loc-num">
1 <def-cat n="noun-nom">
1 <def-cat n="noun-nom">
1 <def-cat n="noun-nom-acc-gen">
1 <def-cat n="noun-not-np">
1 <def-cat n="nouncmp">
1 <def-cat n="nounpx">
1 <def-cat n="num-sg-all">
1 <def-cat n="numattr">
1 <def-cat n="numgen">
1 <def-cat n="numsem">
1 <def-cat n="numsg-acc-nom">
1 <def-cat n="obj_l">
1 <def-cat n="ordinal">
1 <def-cat n="pl-words">
1 <def-cat n="po_l">
1 <def-cat n="prfprcword">
1 <def-cat n="prn-adj-attr">
1 <def-cat n="prn-dem-loc">
1 <def-cat n="prn-ind-dem">
1 <def-cat n="prn-num">
1 <def-cat n="prn-pl-com">
1 <def-cat n="prn-ref_PRONl">
1 <def-cat n="prn-refl-sg-nom">
1 <def-cat n="prn-refl_PRONl">
1 <def-cat n="prnrefgen">
1 <def-cat n="prsprc">
1 <def-cat n="refcom">
1 <def-cat n="sem_measr_time">
1 <def-cat n="semdate">
1 <def-cat n="semyear-loc">
1 <def-cat n="semyear-loc-gen">
1 <def-cat n="semyearPl">
1 <def-cat n="subpred">
1 <def-cat n="verb">
1 <def-cat n="verb-actio">
1 <def-cat n="verb-actio-ess">
1 <def-cat n="verb-actio-gen">
1 <def-cat n="verb-actio-inf">
1 <def-cat n="verb-cond">
1 <def-cat n="verb-for-cmp">
1 <def-cat n="verb-imp">
1 <def-cat n="verb_to_adv">
1 <def-cat n="word-not-verb">
1 <def-cat n="word_qst-not-verb">

