corpus_maintenance

Corpus maintenance

This document keeps track of measures to improve the corpus collection and conversion process. Note also the sentence alignment page, which looks into that specific sub-part of the corpus maintenance.

Corpus improvement work

Mappestruktur osv

  • news: muligheter for å flytte fra bound til free?
  • science: vi har filer sme/science både i free og i bound, uten noen klar deling
  • sma: eget valg for klassiske tekster

Tasks

Meetings in the corpus improvement project

OCR and conversion errors leftover from spring 2011