Divvuns & Giellateknos techdoc Divvuns & Giellateknos techdoc
  • Home
  • Infrastructure
    Transducer infrastructureTechnical maintenanceApplication infrastructureServers, users and accessOld documentationPresentations
  • Tools
    KeyboardsProofing toolsGrammar checkerDictionariesCorpusICALLMachine translationTranslation memoryLinguistic analysisMachine learningText-to-speechLocalisation
  • Languages/linguistics
    Linguistic issuesLanguages
  • Administration
    MeetingsBugzilla

    Linguistics

    • Overview
    • Tutorials
    • Morfeme border markup
    • Tag standardisation
    • Preprocessing

      • Preprocessor
      • Allcaps
      • Foreign
      • Regular expressions

      Morphological analysis

      • Derivational tags
      • Language Independent Tags

      Disambiguation

      • Flowchart
      • Disambiguation
      • Writing disambiguation files

      Testing

      • Testing lexical coverage
      • Testing the disambiguator

UiT Norgga árktalaš universitehta
Copyright © 2004-2019 UiT Norgga árktalaš universitehta
giellalt@uit.no

Documentation common to all languages

Contents:

  • Tutorials
  • Linguistic issues
  • Testing
  • Outdated documentation

Tutorials

  • Tutorials for working with Lexc/Twolc, CG, and UNIX commands
  • How to use the morphological parsers
  • A flowchart of the analysis pipeline

Linguistic issues

  • Preprocessing of text
    • How to split text into tokens
    • Documentation of the preprocessor files
  • Morphological tagging of text, with LEXC and TWOLC
    • Principles for common (language-independent) lexicon entries
    • Handling variation in LEXC
  • Documentation of tags
    • Compoundtags
    • Morphological tags
    • How the different tags are interacting with the FSTs
    • Syntax tags
    • Dependency tags
    • Semantic tags
  • Disambiguation of morphological analysis
    • Morphological disambiguation

Testing

  • LEXC/TWOLC work – Testscripts
  • Check analysis regressions, TO BE WRITTEN
  • Testing the disambiguation

Outdated documentation

Analyzed corpus, Correct corpus, Corpus plan, The tools in the testing directory, The original sme flowchart over the old infra, The makefile setup in our old infra, Our old infrastructure system for flag diacritics, Some old discussions with colleagues