FST Technology Overview
Status and future of Xerox and other FST tools
Presently the Giella infrastructure supports three fst technologies in parallel:
- Xerox
- Hfst
- Foma
Each of them have their strengths and weaknesses, summarised as follows:
Xerox
The standard in all FST work.
Strengths:
- fast in lookup
- fast in compiling source code
Weaknesses:
- no support for weights
- closed source
- abandoned by its main developer Lauri Karttunen (he is retired, and his
Source code access
Even though the source code is not released, it is possible to get a license to
Hfst
A source-code compatible clone of the Xerox tools, based on OpenFST, but with
Strengths:
- replicates 99% of the functionality of the Xerox tools
- is actively maintained
- is almost as fast in lookup as Xerox when using the optimised lookup format
- supports weighted fst's
- open source
Weaknesses:
- is slow in compiling fst's compared to Xerox
Foma
A source-code and command interface compatible clone of the Xerox xfst tool,
Strengths:
- fast in lookup
- fast in compiling source code
- actively maintained
Weaknesses:
- no support for twolc
- limited support for weighted fst's
How to cope with this...
... ie the lack of future for Xerox, the lack of twolc in Foma and the lack of
Today's dual strategy
- use the fast xfst for developing and for some applications
- use the slow hfst for all applications demanding open source and/or weighting
The future
... is dependent upon
- whether hfst will become faster
- whether foma will include support for twolc
- whether it will be possible to license some version of c-fst that will
In the short run, we can continue as we do now.