2013-11-11
Grammar checker meeting
Nov. 11, 2013, Fran & Sjur
- Sjur to test that the code in gramchk compiles
Pending things:
- Stabilise the layout of the data. What we have works, do we need any more files ?
- a bit early, we need to experiment more with the actual features we need
- at least we need a normative generator for the suggestions, possibly with dialect tags
- a bit early, we need to experiment more with the actual features we need
- Should we use a zip file, or just a flat directory layout. What should be the file names ?
- zip file for ease of distribution
- directly read the zip file as for the speller
- suffix: .zgram
- use an index file as for the speller
- zip file for ease of distribution
- Try and develop the error.xml file a bit more:
- Add translations
- Check if it is sufficient
- Add translations
- Tokenisation
- move the sentence splitting code from voikko core to language code
- do sentence splitting in CG, for language-dependent capitalisation errors and sentence boundary definitions
- also word-level tokenisation would need to be moved to the language code, preferably by the lookup tool
- either: extract list of punctuation from fst - Fran if lexc doesn't work out
- or: encode optional punctuation in LexC - Sjur
- either: extract list of punctuation from fst - Fran if lexc doesn't work out
- move the sentence splitting code from voikko core to language code
- Code note: sentence/Sentence.cpp
- Suggestions.
Some existing tools and resources:
-
http://extensions.libreoffice.org/extension-center/lightproof-editor
-
http://libreoffice.hu/2011/12/08/grammar-checking-in-libreoffice/
- http://www.techrepublic.com/blog/five-apps/five-libreoffice-extensions-to-help-you-catch-grammar-problems/
- On/off interface for individual error checks
- talk to Harri about this -- probably wants to do the same for Finnish
- grouping errors into code - type
- perhaps use the settings dialog box for the extension to manage this.
- talk to Harri about this -- probably wants to do the same for Finnish
Integration plan:
- LibreOffice (OpenOffice?)
- MacOSX grammar/speller API
- MS Office - probably through a VB or .Net wrapper (true Office integration is blocked)
Try to lobby MS to allow grammar checkers etc for Greenlandic, Sámi (and others?)