2013-11-11

Grammar checker meeting

Nov. 11, 2013, Fran & Sjur

  • Sjur to test that the code in gramchk compiles

Pending things:

  • Stabilise the layout of the data. What we have works, do we need any more files ?
    • a bit early, we need to experiment more with the actual features we need
    • at least we need a normative generator for the suggestions, possibly with dialect tags
  • Should we use a zip file, or just a flat directory layout. What should be the file names ?
    • zip file for ease of distribution
    • directly read the zip file as for the speller
    • suffix: .zgram
    • use an index file as for the speller
  • Try and develop the error.xml file a bit more:
    • Add translations
    • Check if it is sufficient
  • Tokenisation
    • move the sentence splitting code from voikko core to language code
    • do sentence splitting in CG, for language-dependent capitalisation errors and sentence boundary definitions
    • also word-level tokenisation would need to be moved to the language code, preferably by the lookup tool
      • either: extract list of punctuation from fst - Fran if lexc doesn't work out
      • or: encode optional punctuation in LexC - Sjur
  • Code note: sentence/Sentence.cpp
  • Suggestions.

Some existing tools and resources:

  • On/off interface for individual error checks
    • talk to Harri about this -- probably wants to do the same for Finnish
    • grouping errors into code - type
    • perhaps use the settings dialog box for the extension to manage this.

Integration plan:

  • LibreOffice (OpenOffice?)
  • MacOSX grammar/speller API
  • MS Office - probably through a VB or .Net wrapper (true Office integration is blocked)

Try to lobby MS to allow grammar checkers etc for Greenlandic, Sámi (and others?)