OmegaTT Developer Info
Mac App Bundling
HfstTokenizer can be compiled together with OmegaT and bundled into Mac App.
- Download OmegaT 3.x source code, not 4.x
- Get appbundler used by OmegaT from here.
- install this into ~/.ant/lib/
- this appbundler needs JavaAppLauncher and jre-mac-root to be defined
-
jre-mac-root is a soft link to the folder where Java Runtime libraries are found
-
jre-mac-root is a soft link to the folder where Java Runtime libraries are found
- install this into ~/.ant/lib/
- Download thread safe version of hfst lookup library and put it to OMEGAT_SRC_FOLDER/lib where
- Copy HfstTokenizer.java and HfstStemFilter.java to
- Modify files package name if needed
- Remove throws IOException from getTokenStream method and correct
- diff HfstTokenizer.java against 4.x HfstTokenizer.java (see diffs below)
- Modify files package name if needed
- Add hfst-ol.jar to manifest-template.mf (details below)
- Add lib/hfst-ol.jar entry to manifest.mf 's Class-Path variable
- run ant mac in OmegaT source folder, the one where you installed OmegaT
Diffs:
1c1 < package org.omegat.tokenizer; --- > package no.divvun.tokenizer; 16a17 > import org.omegat.tokenizer.BaseTokenizer; 17a19 > import org.omegat.tokenizer.Tokenizer; 60,63c62,64 < final boolean stopWordsAllowed) { < StandardTokenizer tokenizer = new StandardTokenizer(getBehavior(), < new StringReader(strOrig)); < // tokenizer.setReader(new StringReader(strOrig)); --- > final boolean stopWordsAllowed) throws IOException { > StandardTokenizer tokenizer = new StandardTokenizer(); > tokenizer.setReader(new StringReader(strOrig)); 71,72c72 < return new HfstStemFilter(new StandardTokenizer(getBehavior(), < new StringReader(strOrig)), transducer); --- > return new HfstStemFilter(tokenizer, transducer);
1c1 < package org.omegat.tokenizer; --- > package no.divvun.tokenizer; 11a12 > import org.apache.lucene.util.AttributeSource.State; 47,49c48,49 < for (String s : res) { < // res.forEach(anal -> { < String stem = s.substring(0, s.indexOf("+")); --- > res.forEach(anal -> { > String stem = anal.substring(0, anal.indexOf("+")); 53c53 < } --- > });
Add the following for hfst-ol.jar to template:
Name: org.omegat.tokenizer.HfstTokenizer OmegaT-Plugin: tokenizer