2013-09-26-plan-gramchk
26.09.2013
present: 
- Sjur 
- Linda
grammar checker project plan
0 intro 
- working definition: errors that cannot be resolved by the spellchecker 
- Excluding real word errors by default
1 done until now:
- error type classification - lexical errors (&lex-majuscule)
- morphosyntactic errors (&msyn-inf_not_actio)
- syntactic errors (&syn-case_congruence)
- real-word errors (&real-vuosttaš)
- correct tags (&corr-not-compound)
 
- lexical errors (&lex-majuscule)
- additional error types - punctuation errors 
- number formatting errors 
- capitalisation errors
 
- punctuation errors 
2 todo:
- practical things: - move SME (and GC) from old to new infrastructure
- meetings with Francis
 
- move SME (and GC) from old to new infrastructure
- maintenance: - add/change/update semantic/syntactic tags
 
- work on things started: - Duommá's 250 word list (compounds that lead to real word errors) - excluding real word errors by default
- rules for valency example sentences collected in gramchkcorpus.txt
 
- Duommá's 250 word list (compounds that lead to real word errors) - excluding real word errors by default
- errors: - find out which types of errors are most frequent 
- error corpus - size?? other sources?? - $GTFREE/goldstandard/orig/sme (xserve)
- main/gt/sme/src/gramchk/gramchkcorpus.txt 
 
- $GTFREE/goldstandard/orig/sme (xserve)
 
- find out which types of errors are most frequent 
- possible classes?
- presentation: - sponsor-demonstrations 
- release early/often (Open Source principles)
- we cannot make a Microsoft Office grammar checker - prohibited by MS - users can protest by writing to them ;) (we can only deliver to LibreOffice)
- look at a graphic grammarchecker (voikko - Finnish)
- http: //wiki.apertium.org/wiki/Spellchecking
 
- sponsor-demonstrations 
- rules: - for real word errors: which semantic tags can be combined? - dálkkádat + rap + poarta 
- bigrams and statistics for compounds? 
- fix/annotate grammatical errors (compounds) already in- hfst-proc må truleg oppdaterast for å gje alle analyser av potensielle 
 
- for real word errors: which semantic tags can be combined? - dálkkádat + rap + poarta 
Samansetjingsfeil - særskriving:
[N Nom] [N ...] ===== kasusfeil (Gen not Nom) / sammensettingsfeil [N Nom/N Gen] [N ...] ===== [N Gen] [N ...] ===== [N Nom+VR] [N ...] ===== med vokalreduksjon (VR) - alltid feil [N Nom/N Gen+VR][N ...] ===== --"-- [N Gen+VR] [N ...] ===== --"--
VR = Vokalreduksjon
what is one word? 
- stavekontroll - space before and after 
- tokenizer: - space as a possible sign in a compound (in the case: - CG needs to clean up - disambiguate
 
- space as a possible sign in a compound (in the case: 
- tools to be used: - dependencies 
- valencies 
- semantic roles 
- semantic prototypes
 
- dependencies 

