Divvuns & Giellateknos techdoc Divvuns & Giellateknos techdoc
  • Home
  • Infrastructure
    Transducer infrastructureTechnical maintenanceApplication infrastructureServers, users and accessOld documentationPresentations
  • Tools
    KeyboardsProofing toolsGrammar checkerDictionariesCorpusICALLMachine translationTranslation memoryLinguistic analysisMachine learningText-to-speechLocalisation
  • Languages/linguistics
    Linguistic issuesLanguages
  • Administration
    MeetingsBugzilla

    Corpus

    • Overview
    • Overview and important links

      • Introduction
      • Corpus Tools
      • Repositories
      • Metadata files

      Corpus collection/maintenance

      • Korpussamlerens 1-2-3
      • Corpus collector's manual
      • Maintenance
      • Corpus analysis
      • Corpus conversion
      • Language recognition
      • Unicode normalisation
      • OCR
      • Wikipedia as corpus

      Sentence alignment

      • Overview
      • Workflow
      • Improving PDF-files
      • TCA2 parameters
      • Graph. interface
      • Alternatives
      • Meetings

        • 2017-06-21
        • 2017-07-04
        • 2012-03-22
        • 2012-03-12
        • 2012-02-29
        • 2012-02-17
        • 2012-02-13
        • 2012-02-07
        • 2012-02-01
        • 2012-01-25
        • 2012-01-19
        • 2012-01-12
        • 2011-12-20
        • 2011-12-14
        • 2011-12-08
        • 2011-11-28
        • 2011-11-25
        • 2011-04-07

      Korp

      • Installations
      • Ordbilde

        • Overview
        • Plan for content

      Spoken corpora

      • Overview
      • LIA

        • Overview

        ELAN

        • Overview
        • ELAN documentation
        • Elan tiers
        • FSTs
        • GRAID
        • GT corpus
        • Metadata
        • TLA
        • Toolbox
        • Transcription

UiT Norgga árktalaš universitehta
Copyright © 2004-2019 UiT Norgga árktalaš universitehta
giellalt@uit.no

tca2

Contents:

  • Overview
  • Alignment
  • Evaluations and notes
  • Original Bergen documents
  • Alternatives to tca2
  • Meetings

TCA2 is a program for sentence alignment, developed by Knut Hofland and Øystein Reigem at the UiB.

Overview

Workflow

  • Workflow/Bargovuohki
  • Files for alignment 2019:
    • nob2sma/admin/sd/samediggi.no
    • nob2sma/admin/sd/samediggi.no PRM-filer

Alignment

  • Improving sentence alignment for pdf files
  • Setting the TCA2 parameters
  • How to use the graphical interface (plan)

Evaluations and notes

  • fin2sme 2017
  • nob2sma 2017
  • 2012 FAD project
    • Goldstandard test results (empty file)
    • TMX goldstandard test with min and max values (empty file)
    • Abbr test results
    • Parallell gov. papers

Original Bergen documents

  • TCA2 hompage
  • Knut Hofland and Stig Johansson: The Translation Corpus Aligner: A program for automatic alignment of parallel texts ( see also here)
  • The october 2006 README file
  • The original README file
  • The background for TCA2
  • Demo document with nice pictures

Alternatives to tca2

  • We need to look at alternatives to tca2

Meetings

  • 2017: 21.6.   // 4.7.  
  • 2012: 12.1.   // 19.1.   // 25.1.   // 1.2.   // 7.2.   // 13.2. // 17.2. // 29.2. // 12.3. // 22.3.
  • 2011: 7.4.   // 25.11.  // 28.11.  // 8.12.  // 14.12.  // 20.12.