What is OmegaT?
OmegaT is one of many computer-assisted translation
The user documentation page for OmegaT refers to installation and user documentation, and can be found here:
Internal plans for OmegaT development
What follows are our thoughts for developing CAT for Saami.
The idea is to offer a set of ready-made folders, perhaps in two different formats:
- as a one-time downloading of a zipped file archive
- as svn checkout (via Tortoise on Windows) for access to updates
For the time being, the folders are at https://gtsvn.uit.no/biggies/trunk/mt/omegat/.
The idea is to put the following resources into the following subdirectories:
- into dictionary: our StarDict dictionary smenob (OmegaT documentation) (todo)
- into glossary: term lists, partly fad-marked pairs, partly from satni.org, cf documentation ( done)
- into tm: our parallel texts, all files fused into one .tmx file (or one per theme), cf documentation ( done)
- into omegat: a file segmentation.conf, for doing sentence level segmentation, cf. documentation ( done for sme)
The source and target folders are given svn ignore status, as we develop the folders we should determine what other files to ignore and what to share.
The language pairs
The language pairs are of three types:
smesmn, smesmj, smesmn: The main thing here is MT, glossaries and
nobsme, nobsmj, nobsma, finsme, finsmn, finsms: Here we have no MT
smasme, smjsme, smnsme, smenob: these we ignore in OmegaT for now.
- Add glossaries (done for nobsmX)
- Improve mt (done for nobsmX)
- Develop segmentation.conf (in progress for sme)
- Test and evaluate
Adding more resources:
- Improve segmentation.conf
- Investigate the settings pane for optimal choices
- Add proofing tools (hunspell)
- Lemmatisation of dictionary lookup (FST or StarDict)
You can get the hfst tokenizer ready compiled. You need to download:
And put them into ~/Library/Preferences/OmegaT/plugins (create the dir if it's not there). Tested and works with OmegaT 4.x.
Hfst tokenizer source is at github