Ghent Workshop2018

Ghent workshop June 5.-9. 1018

Present: Anna, Jack, Jeremy, Sasha, Trond

The workshop


  • Topic for the workshop: Improving the morphology and syntax of mhr.
  • Result: Much done, much remaining.


Morphology and syntax must be better coordinated before next time


  • Make a benchmark after this week, before the next week, after it.

Planning ahead

Linguistic issues

  • Improving morphology
  • Improving syntax
  • Getting the corpus in place
    • realistic = 50 mill words
    • Further goal:

Corpus issues

Move from 4 to 40 mill words

This will happen in the summer / early autumn

  1. Trond, Jeremy: Read corpus documentation, collect texts, schedule for addition to Rusbound.
  2. Discuss CorpusTools with Børre
  3. Add the texts to the corpus.

Put things into use

First and foremost the spellcheckers

Things to do before the next meeting

  1. Jack to fix the lexicon issues on his table
  2. Jeremy to look at the result
  3. Adjustment of FST and CG (what tags do we want?)
  4. All corpus text available to be collected
  5. Trond to look at non-linguistic tagging issues
  6. Us all to look at the CG
  7. Us all to look at the coorpus

Evaluation to be done:

  • What is the coverage of the FST?
  • What is the disambiguation rate?

Next meeting time

Possible time: Last week of september

TODO: (all): Check calendars.