TC A2_readme
TRANSLATION CORPUS ALIGNER (TCA) 2
Home page: http://gandalf.aksis.uib.no/tca2/ (only in Norwegian at the moment)
(See also the file list --files.readme.txt below)
The program is written in Java by Øystein Reigem, AKSIS/UNIFOB, University of Bergen
The text files have to be in marked up in XML and well-formed, but they don't have to have a
Two small support programs are included in the directory. One program divides a text
The texts can be encoded in iso-8859-x or utf-8 (this is given in the encoding attribute
Format of the anchor list:
- A list of words separated with comma in language 1 / language 2
- The words can be truncated to the left and right with a star
Sample:
begin*, began, begun, start* / begyn*, start*
When files are opened in TCA2 then language 1 file has to be open to the left (as file 1).
The program will align <s> and <head> but this can be changed in the
The program is run by opening a command window and executing the alignment.bat file
Open File 1, File 2, Anchor Words (and click use this anchor list). Check also the
The program require Java 1.5.
File | Description |
---|---|
Om_aligmentprogrammet.doc | Description of program (in Norwegian) |
TCA2_demo_docs_20050706.doc | some screen dumps of the program (not 100% up to date) with comments |
alignment.bat | bat file to run in a command window (you will then see error messages) |
alignment.jar | the alignment program, require Java 1.5 |
anchor-eng-nor.txt | an English - Norwegian anchor list (in UTF-8) |
RS61E-1.xml | An English text with paragraph mark-up |
RS61E-2.xml | An English text with paragraph/sentence mark-up |
RS61E.xml | An English text with sentence id ready for alignment |
RS61N-1.xml | A Norwegian text with paragraph mark-up |
RS61N-2.xml | A Norwegian text with paragraph/sentence mark-up |
RS61N.xml | A Norwegian text with sentence id ready for alignment |
-cor.xml, -new.txt | Output formats |
RS61-aligned.htm | Aligned sample files as table in HTML |
gen-id-linux | program to generate id (run from command line) |
gen-id.exe | - |
job-pc.bat | job on pc to run sentence/gen-id |
job-linux.sh | job on linux |
new2htm.exe | Merge two -new files to a HTML-table (run from command line) |
sentence-linux | program to divide text into sentences (command line, not UTF-8 ready) |
sentence.exe | - |