2015-11-18

Grammatikkontroll - oppstartsmøte 18.11.2015

Status for arbeidet som er gjort til no:

Linda:

  • real-word errors
  • syntaktiske feil:
    • kongruens
    • kasusfeil med passive verb
    • kasusfeil med adposisjoner
    • kasusfeil etter numeraler
    • adjektivfeil (attributiv vs. predikativ form, feil ved komparering)
    • bruk av pronomen (refleksiv vs. personlig pronomen)
    • numerus feil (numeraler + substantiv)
    • leksikalske feil
  • lister med flaue feil? (norsk: 'driften' feilstava som 'dritten', osb.; du vil typisk alltid flagga det ordet)
  • valensfeil:
    • har testa dei verba som er annoterte
    • 0,77 % presisjon (kan vera betre no)
    • må annotera fleire verb
    • potensielt ei stor gruppe feil (kasusfeil og andre feil)
  • oppdagar heile tida nye feil
  • grammatikkontrollen finn feil som Duomma ikkje ser med ein gong : )

Vi har (nesten) ikkje jobba med:

  • sammensettingsfeil: særskriving og samskriving:
    «Hui ollusat leat dušše geavahan luohte oasi ja Sverre Kjelsberg oasi leat guođđán, dadjá Simona Máhtte ja lohká son ii dieđe manne Sverre Kjelsberg ii lean mielde Gylne Tider joavkkus.» -- "luohteoasi"
    «Dattetge ballá son ráđđehusa guolástan politihkka, gos eriid sáhttá gávppašit, bágge smávvaguolásteaddjiid gáddái.» --- "guolástanpolitihkka"
    • lister med unntak
    • slå opp to og to ord som eitt ord
  • tegnsettingsfeil

Punktar til ein arbeidsplan

  • workshop med språkbrukarar, som kan gje synspunkt på feiltypar
    • kva er det folk ventar seg?
    • kva er det dei opplever at dei treng hjelp med?
  • kontakt med språknormeringsorgan
  • forslagsmekanisme
  • webdemo (via Google Code-In (Fran, Kevin)?) -- [FMT] Lik som LanguageTools webdemo fungerer? -- https: //www.softcatala.org/corrector (i januar?)
  • distribuerbar binærfil for LO (alfa) til hausten?
  • GC-arkitektur
  • hfst-proc2
  • byta ut lookup2cg (perl) med cgconv + konvensjonar for cg-subreadings

Program for dagane

Tysdag:

  • 13-14 - Francis presenterer koden sin for Kevin (og Sjur) [på IRC]

Onsdag:

  • 9.00 - Kevin presenterer dei norske grammatikkontrollane
  • 10.00 - Linda presenterer delar av det nordsamiske arbeidet
  • 11.00 - Diskutera feiltypar vi vil satsa på ut i frå det vi har sett til no

Onsdag kl 9 - Kevin presenterer nn/nb

Pipeline for nn/nb:

  teiknsetijngshack -- ka er det? 
  -- «(kake )» vanlegvis tokenisert som "(" "kake" ")" – men då forsvinn feilmellomrommet før «)»
  -- hack: gjer om " )" → "\ )"
| morf.analyse (tokenisering inkludert) -- fst, deskriptiv med feiltaggar
| særskrivingsoppslag (burde sikkert vore i steg 1)
| stavekontroll (burde sikkert òg vore kombinert med steg 1 …) -- hadde den tilgang til morf. kontekst? ja, men ikkje brukt -- kosjn fungerer det praktisk sett, kjøres den to ganger?
| morf.disambiguering (OBT)
| syntaksdis (OBT)
| enkel dep/chunking --- kosjn elementer la du dep til? berre NP-chunking
| grammatikkfeilreglar
| forslagsgenerering

brukte du statistikk?

  • helst offline, listegenerering
  • særskriving online: frekvens(ananasringer) > frekvens(ananas ringer)
  • burde skje: full dokumentfrekvensliste før køyring, for å oppdaga korrekte, men ukjende ord
  • kor sjekka du tegnsetting og formateringsfeil (dvs. linjeshift som gjør at to setninger ser ut som ett osv.)
    • teiknsetjingshack la til spesialteikn
«kort setning her§

meir»

«kort setning her</h?>§meir»

Og:

«Setning her.<br/>Ny paragraf»
«Setning her.
Ny paragraf»
  • input: ^ananas/ananas<n>$ ^ringer/ring<n>/ringe<vblex>$
  • slå opp ananasringer
  • output: ^ananas/ananas<n><!SÆR>$ ^ringer/ring<n>/ringe<vblex>$
  • input: «vi riner i telefonen»
  • før morf.dis, etter stavekontroll:
"<riner>"
    "ringe" vblex pres !stavefeil
    "ring" n pl !stavefeil
  • etter dis:
"<riner>"
    "ringe" vblex pres !stavefeil
    ;"ring" n pl !stavefeil

Pipeline for sme

morf. analyse (deskriptiv, noen feilanalyser)
| legge til valenstagger til verb via CG (SUBSTITUTE)
| disambiguering + syntaktisk analyse
| grammatikkontroll
  • feilfinningsregler+korrigeringsregler for real word errors - lokale feil (refererer ikkje til dep/sem.rolle)
  • et sett med dependensregler for
    • innafor adposisjonsfraser
    • korrekte argumenter av verb
  • et sett med SUBSITUTE regler som legger til semantiske roller for argumenter av verb
  • globale feilfinningsregler+korrigeringsregler
    • valensfeil
    • kongruensfeil
    • osv.

REAL WORD ERROR:

"<dego>"
        "dego" CS @CVP MAP:7549:r10 #17->17
;       "de" Adv Qst REMOVE:5693:r1076
;       "dego" CS @CNP MAP:7549:r10 REMOVE:7726:r1459
"<livččii>"
        "leat" <aux> V <copula> <TH-Nom-Any> <mielde> <OR-Loc-HumGroup> <OR-eret-Plc> <LO-Loc-johtu><DE-Ill-Plc> <AT-Abe-Any> <AT-Nom-Any> <AT-Nom-Adj><EX-Ill-Ani> <PO-Gen-Hum> <MA-mielde-Any> <MA-Adv-Manner> <XT-Gen-Measr> <LO-maŋŋil-Time> <LO-Acc-Time> <LO-Loc-Time> <CO-Com-Ani> <ID-Nom-Any> <TH-Nom-Any><RO-Ess-Any><EX-Ill-Any> <EX-Ill-Ani><TH-Nom-Adj> <EX-Ill-Ani> <TH-Nom-Obj><RE-Ill-Ani> <LO-Loc-Any> <AktioEss> <RO-Ess-Any><PU-Ill-Act> <RO-Ess-Any> IV Cond Prs Sg3 @+FAUXV MAP:9983 #18->18 SUBSTITUTE:5846:SubV=aux SUBSTITUTE:7318 SUBSTITUTE:5846:SubV=aux SUBSTITUTE:7318 SUBSTITUTE:7911
        "leat" <aux> V <copula> <TH-Nom-Any> <mielde> <OR-Loc-HumGroup> <OR-eret-Plc> <LO-Loc-johtu><DE-Ill-Plc> <AT-Abe-Any> <AT-Nom-Any> <AT-Nom-Adj><EX-Ill-Ani> <PO-Gen-Hum> <MA-mielde-Any> <MA-Adv-Manner> <XT-Gen-Measr> <LO-maŋŋil-Time> <LO-Acc-Time> <LO-Loc-Time> <CO-Com-Ani> <ID-Nom-Any> <TH-Nom-Any><RO-Ess-Any><EX-Ill-Any> <EX-Ill-Ani><TH-Nom-Adj> <EX-Ill-Ani> <TH-Nom-Obj><RE-Ill-Ani> <LO-Loc-Any> <AktioEss> <RO-Ess-Any><PU-Ill-Act> <RO-Ess-Any> IV Cond Prs Err/Orth Sg3 @+FAUXV MAP:9983 #18->18 SUBSTITUTE:5846:SubV=aux SUBSTITUTE:7318 SUBSTITUTE:5846:SubV=aux SUBSTITUTE:7318 SUBSTITUTE:7911
"<ballan>"
        "ballat" <mv> V <EX-Nom-Ani> <heaggabeallái> <jámas> <RS-dihte-Any> <RS-Acc-Reason> <TH-AktioLoc> <TH-Acc-Any><TH-Inf> <TH-Acc-Any><TH-PrfPrc> <TH-Acc-Any><TH-AktioEss> <TH-Com-Impers> <TH-Acc-Impers> <TH-Loc-Any> <TH-jus> <TH-go> <TH-FS-Qst> <TH-ahte> <TH-0> TV PrfPrc @-FMAINV SELECT:9474:r1852 MAP:10044:r409 &real-ballán #19->19 ADD:4953:real-ballán SUBSTITUTE:5845:SubV=mv ADD:4953:real-ballán SUBSTITUTE:5845:SubV=mv
        "ballat" <EX-Nom-Ani> <heaggabeallái> <jámas> <RS-dihte-Any> <RS-Acc-Reason> <TH-AktioLoc> <TH-Acc-Any><TH-Inf> <TH-Acc-Any><TH-PrfPrc> <TH-Acc-Any><TH-AktioEss> <TH-Com-Impers> <TH-Acc-Impers> <TH-Loc-Any> <TH-jus> <TH-go> <TH-FS-Qst> <TH-ahte> <TH-0> TV PrfPrc @-FMAINV SELECT:9474:r1852 MAP:10044:r409 <mv> V TV &SUGGEST #19->19 COPY:4954:real-ballán SUBSTITUTE:5845:SubV=mv SUBSTITUTE:5845:SubV=mv
        "ballat" <mv> <EX-Nom-Ani> <heaggabeallái> <jámas> <RS-dihte-Any> <RS-Acc-Reason> <TH-AktioLoc> <TH-Acc-Any><TH-Inf> <TH-Acc-Any><TH-PrfPrc> <TH-Acc-Any><TH-AktioEss> <TH-Com-Impers> <TH-Acc-Impers> <TH-Loc-Any> <TH-jus> <TH-go> <TH-FS-Qst> <TH-ahte> <TH-0> TV PrfPrc @-FMAINV SELECT:9474:r1852 MAP:10044:r409 V TV &SUGGEST #19->19 COPY:4954:real-ballán SUBSTITUTE:5845:SubV=mv
;       "ballat" V <EX-Nom-Ani> <heaggabeallái> <jámas> <RS-dihte-Any> <RS-Acc-Reason> <TH-Com-Impers> <TH-Acc-Impers> <TH-Loc-Any> <TH-jus> <TH-go> <TH-FS-Qst> <TH-ahte> <TH-0> TV Actio Gen SELECT:9474:r1852
;       "ballat" V <heaggabeallái> <jámas> <RS-dihte-Any> <RS-Acc-Reason> <TH-Com-Impers> <TH-Acc-Impers> <TH-FS-Qst> <TH-0> TV Actio Nom SELECT:9474:r1852
;       "ballat" V* TV* Der/NomAct N Sg Gen SELECT:9474:r1852
;       "ballat" V* TV* Der/NomAct N Sg Nom SELECT:9474:r1852
;       "ballat" V <EX-Nom-Ani> <heaggabeallái> <jámas> <RS-dihte-Any> <RS-Acc-Reason> <TH-AktioLoc> <TH-Acc-Any><TH-Inf> <TH-Acc-Any><TH-PrfPrc> <TH-Acc-Any><TH-AktioEss> <TH-Com-Impers> <TH-Acc-Impers> <TH-Loc-Any> <TH-jus> <TH-go> <TH-FS-Qst> <TH-ahte> <TH-0> TV Ind Prt ConNeg SELECT:9474:r1852
"<eret>"
        "eret" Adv @<ADVL MAP:16555 #20->20
"<doppe>"
        "doppe" Adv Sem/Plc @<ADVL MAP:16555 #21->21
"<.>"
        "." CLB #22->22
"<Ruošša>" 
    "ruošša" G3 N Sem/Hum Sg Nom @SUBJ> MAP:17047 &msyn-valency-loc-acc #1->1 ADD:8336:wrong-valency-loc-acc
    "ruošša" G3 N Sem/Hum Sg Acc @OBJ> MAP:17169:IfNoTransV< &msyn-valency-loc-acc #1->1 ADD:8336:wrong-valency-loc-acc
    "Ruošša" N Prop Sem/Plc Sg Acc @OBJ> MAP:17169:IfNoTransV< &msyn-valency-loc-acc #1->1 ADD:8336:wrong-valency-loc-acc
    "Ruošša" N Prop Sem/Plc Sg Nom @SUBJ> MAP:17047 &msyn-valency-loc-acc #1->1 ADD:8336:wrong-valency-loc-acc
    "ruošša" G3 N Sem/Hum Sg @OBJ> MAP:17169:IfNoTransV< Loc &SUGGEST #1->1 COPY:8350:wrong-valency-loc-acc
    "Ruošša" N Prop Sem/Plc Sg @OBJ> MAP:17169:IfNoTransV< Loc &SUGGEST #1->1 COPY:8350:wrong-valency-loc-acc
;    "ruošša" A Attr REMOVE:5438:r1016
;    "ruošša" G3 N Sem/Hum Sg Gen REMOVE:11494:r2269
;    "Ruošša" N Prop Sem/Plc Sg Gen REMOVE:11494:r2269
"<bealde>"
    "bealde" Po @ADVL> MAP:16517 #2->2
    "bealde" Adv @ADVL> MAP:16517 #2->2
;    "bealde" Pr REMOVE:5888:r1123
"<ballet>"
    "ballat" <mv> V <EX-Nom-Ani> <heaggabeallái> <jámas> <RS-dihte-Any> <RS-Acc-Reason> <TH-AktioLoc> <TH-Acc-Any><TH-Inf> <TH-Acc-Any><TH-PrfPrc> <TH-Acc-Any><TH-AktioEss> <TH-Com-Impers> <TH-Acc-Impers> <TH-Loc-Any> <TH-jus> <TH-go> <TH-FS-Qst> <TH-ahte> <TH-0> TV Ind Prs Pl3 @+FMAINV MAP:9988:r406 #3->3 SUBSTITUTE:5728:SubV=mv SUBSTITUTE:5728:SubV=mv
;    "ballat" V <EX-Nom-Ani> <heaggabeallái> <jámas> <RS-dihte-Any> <RS-Acc-Reason> <TH-AktioLoc> <TH-Acc-Any><TH-Inf> <TH-Acc-Any><TH-PrfPrc> <TH-Acc-Any><TH-AktioEss> <TH-Com-Impers> <TH-Acc-Impers> <TH-Loc-Any> <TH-jus> <TH-go> <TH-FS-Qst> <TH-ahte> <TH-0> TV Imprt Pl2 REMOVE:9256:NotImprtN
;    "ballat" V <EX-Nom-Ani> <heaggabeallái> <jámas> <RS-dihte-Any> <RS-Acc-Reason> <TH-AktioLoc> <TH-Acc-Any><TH-Inf> <TH-Acc-Any><TH-PrfPrc> <TH-Acc-Any><TH-AktioEss> <TH-Com-Impers> <TH-Acc-Impers> <TH-Loc-Any> <TH-jus> <TH-go> <TH-FS-Qst> <TH-ahte> <TH-0> TV Ind Prt Sg2 @+FMAINV MAP:9988:r406 REMOVE:14227:allPrtSg2
"<mánáid>"
    "mánná" N Sem/Hum Err/Orth Pl Gen #4->4
    "mánná" N Sem/Hum Pl Gen #4->4
    "mánná" N Sem/Hum Err/Orth Pl Acc @<OBJ MAP:17164:IfNoTransV> #4->4
    "mánná" N Sem/Hum Pl Acc @<OBJ MAP:17164:IfNoTransV> #4->4
"<ruoššaiduvvot>"
    "ruoššaiduvvot" ? #5->5
"<.>"
    "." CLB #6->6
"<báhpain>"
        "báhppa" §RE N Sem/Hum Pl Loc @ADVL> SELECT:13357:GRAMval MAP:16562 #11->12 SUBSTITUTE:7172 SUBSTITUTE:7172
        "báhppa" §RE N Sem/Hum Err/Orth Pl Loc @ADVL> SELECT:13357:GRAMval MAP:16562 #11->12 SUBSTITUTE:7172 SUBSTITUTE:7172
;       "báhppa" N Sem/Hum Err/Orth Sg Com SELECT:13357:GRAMval
;       "báhppa" N Sem/Hum Sg Com SELECT:13357:GRAMval
"<jearran>"
        "jearrat" V <TH-Acc-*Ani><RE-Loc-Ani> <RE-Loc-Ani> <TO-birra-Any> <TH-Acc-Any><RE-Loc-Ani> <TH-FS-Qpron> <TH-FS-Qst> TV Actio Nom @S
UBJ> SELECT:11813:r2340 MAP:17092 #12->12 SETCHILD:6767 SETCHILD:6767
;       "jearrat" V <TH-Acc-*Ani><RE-Loc-Ani> <RE-Loc-Ani> <TO-birra-Any> <TH-Acc-Any><RE-Loc-Ani> <TH-FS-Qpron> <TH-FS-Qst> <TH-ahte> TV In
d Prt ConNeg REMOVE:9256:NotConNegIfNotNeg
;       "jearrat" V <TH-Acc-*Ani><RE-Loc-Ani> <RE-Loc-Ani> <TO-birra-Any> <TH-Acc-Any><RE-Loc-Ani> <TH-FS-Qpron> <TH-FS-Qst> <TH-ahte> TV Pr
fPrc @-FMAINV MAP:10036:r407 SELECT:11813:r2340
;       "jearrat" V* TV* Der/NomAct N Sg Nom SELECT:11813:r2340
;       "jearrat" V* TV* Der/NomAct N Sg Gen SELECT:11813:r2340
;       "jearrat" V <TH-Acc-*Ani><RE-Loc-Ani> <RE-Loc-Ani> <TO-birra-Any> <TH-Acc-Any><RE-Loc-Ani> <TH-FS-Qpron> <TH-FS-Qst> <TH-ahte> TV Ac
tio Gen SELECT:11813:r2340
"<manne>"
        "mannat" <mv> V <rastá-V> <badjel-V> <ala-V> <AG-Nom-Any> <gitta> <eret> <rasta> <badjel> <birra> <oktii><RF-Com-Any> <oktii> <TH-Nom-*Ani><MA-Adv-Manner> <MA-Adv-Manner> <IN-Com-Veh> <XT-Acc-Measure> <SO-luhtte-Ani> <DE-Ill-Plc> <DE-sisa-Build> <DE-lusa-Ani> <PT-Gen-Plc><DE-Ill-Any> <PT-Gen-Plc> <PT-rastá-Plc> <PT-meaddel-Plc> <PT-čađa-Plc> <PT-bokte-Plc> <SO-Loc-*Ani><DE-Ill-*Ani> <SO-Loc-*Ani> <DE-Ill-*Ani> <CO-mielde-Ani> <LO-luhtte-Any> <LO-Loc-Plc> <DE-Ill-Plc><PU-Inf> <PU-Inf> <PU-AktioEss> <RO-Ess-Any> IV Ind Prt Pl3 @+FMAINV SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 MAP:10010:r406 SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 SELECT:14285:r2976 SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 &syn-add-go #13->13 SUBSTITUTE:5845:SubV=mv SUBSTITUTE:5845:SubV=mv ADD:7742:syn-add-go
        "mannat" <mv> V <rastá-V> <badjel-V> <ala-V> <AG-Nom-Any> <gitta> <eret> <rasta> <badjel> <birra> <oktii><RF-Com-Any> <oktii> <TH-Nom-*Ani><MA-Adv-Manner> <MA-Adv-Manner> <IN-Com-Veh> <XT-Acc-Measure> <SO-luhtte-Ani> <DE-Ill-Plc> <DE-sisa-Build> <DE-lusa-Ani> <PT-Gen-Plc><DE-Ill-Any> <PT-Gen-Plc> <PT-rastá-Plc> <PT-meaddel-Plc> <PT-čađa-Plc> <PT-bokte-Plc> <SO-Loc-*Ani><DE-Ill-*Ani> <SO-Loc-*Ani> <DE-Ill-*Ani> <CO-mielde-Ani> <LO-luhtte-Any> <LO-Loc-Plc> <DE-Ill-Plc><PU-Inf> <PU-Inf> <PU-AktioEss> <RO-Ess-Any> IV Ind Prt Pl3 @+FMAINV SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 SUBSTITUTE:6445 SUBSTITUTE:6478 SUBSTITUTE:6479 SUBSTITU: