FST Technology Overview

Status and future of Xerox and other FST tools

Presently the Giella infrastructure supports three fst technologies in parallel:

  • Xerox
  • Hfst
  • Foma

Each of them have their strengths and weaknesses, summarised as follows:

Xerox

The standard in all FST work.

Strengths:

  • fast in lookup
  • fast in compiling source code

Weaknesses:

  • no support for weights
  • closed source
  • abandoned by its main developer Lauri Karttunen (he is retired, and his effort to open-source it was blocked by Xerox lawyers)

Source code access

Even though the source code is not released, it is possible to get a license to the source code of the c-fsm library ( documented here) by requesting a license for the XLE page. Information and relevant links can be found at the bottom of the project page.

Hfst

A source-code compatible clone of the Xerox tools, based on OpenFST, but with multiple backends (Foma, SFST, OpenFST).

Strengths:

  • replicates 99% of the functionality of the Xerox tools
  • is actively maintained
  • is almost as fast in lookup as Xerox when using the optimised lookup format
  • supports weighted fst's
  • open source

Weaknesses:

  • is slow in compiling fst's compared to Xerox

Foma

A source-code and command interface compatible clone of the Xerox xfst tool, developed and maintained by Måns Huldén. Is open source.

Strengths:

  • fast in lookup
  • fast in compiling source code
  • actively maintained

Weaknesses:

  • no support for twolc
  • limited support for weighted fst's

How to cope with this...

... ie the lack of future for Xerox, the lack of twolc in Foma and the lack of speed in Hfst.

Today's dual strategy

  • use the fast xfst for developing and for some applications
  • use the slow hfst for all applications demanding open source and/or weighting

The future

... is dependent upon

  • whether hfst will become faster
  • whether foma will include support for twolc
  • whether it will be possible to license some version of c-fst that will produce the command line tools we need

In the short run, we can continue as we do now.