corpus_maintenance
Corpus maintenance
This document keeps track of measures to improve the corpus collection and conversion process.
Corpus improvement work
Mappestruktur osv
- news: muligheter for å flytte fra bound til free?
- science: vi har filer sme/science både i free og i bound, uten noen klar deling
- sma: eget valg for klassiske tekster
Tasks
Meetings in the corpus improvement project
- 2019:
- 2017:
- 2016:
- 2014:
- 2012:
- 2011:
OCR and conversion errors leftover from spring 2011
-
OCR error overview, May 2011 (still open issues here)
-
Conversion errors (open issues here?)