fsts

FSTs

Oahpa is dependent on linguistic resources. Some of them are transducers.

Oahpa versions with address testing.oahpa.no/.. are on gtlab, and oahpa versions with address http://oahpa.no/.. are on gtoahpa. fst-s should be copied to the /opt/smi/LANG/bin catalogue of the respective machine.

For the non-sme-sma versions, see below:

The sme transducers

The North Saami transducers are located on gtoahpa.uit.no, in /opt/smi/sme/bin/. Note that the filenames are still not moved to newinfra-style.

TODO: Change to new names.

ped-sme.fst, sme-num.fst, isme-norm.fst, isme-GG.restr.fst, isme-KJ.restr.fst

The sme Oahpa uses a particular setup for analysers and transducers, referred to in the file univ_oahpa/univ_drill/forms.py:

  • preprocess = /opt/sami/cg/bin/preprocess
  • lookup = /opt/sami/xerox/c-fsm/ix86-linux2.6-gcc3.4/bin/lookup
  • fst = /opt/smi/sme/bin/ped-sme.fst
  • lookup2cg = /usr/local/bin/lookup2cg
  • cg3 = /usr/local/bin/vislcg3
  • dis = /opt/smi/sme/bin/sme-ped.cg3

The sma transducers

The South Saami transducers are located on gtoahpa, in /home/smaoahpa/smaoahpa/gtsvn/gt/sma/bin/.

TODO: These transducers should be located in /opt/smi/sma/bin/ as for all the other languages. And note: They probably are, since the info here is old.

isma-L.restr.fst, isma-SH.restr.fst, ped-sma.fst, sma-num.fst

Smaoahpa has a setting which is used by all of the games and install scripts to locate transducers, as well as the lookup command.

	LOOKUP_TOOL = '/usr/local/bin/lookup'
	FST_DIRECTORY = '/home/smaoahpa/smaoahpa/gtsvn/gt/sma/bin'

When in doubt, the smaoahpa install command will show which FSTs are in use each time they are called to generate words, so if something odd seems to be happening, check the paths that this command returns.

Numra and Klokka will also give errors when the FSTs cannot be found, but these errors are intentionally vague for security purposes.

smaoahpa dialects

Smaoahpa requires additional settings in settings.py for dialects and fst paths.

	DIALECTS = {
		'main': ('isma-norm.fst', 'Unrestricted'),
		'SH': ('isma-SH.restr.fst', 'Short forms'),
		'L': ('isma-L.restr.fst', 'Long forms'),
		'NG': (None, 'Non-Presented forms'),
	}

Data is stored in a Python dictionary, each key is a short name for the dialect, and the value is a tuple containing the FST file name (without the path), and a long descriptive name for the admin interface. The short name must correspond to the dialect name as used in lexicon files.

In order to generate, an FST file must be defined here. Forms are added to corresponding dialects depending on the success of the generation process.

Pregeneration

Pregenerated forms have a specific XML structure for supplying forms in the paradigm. See that document for more information.

The other oahpa versions

Note that they use new fst names.

TODO: document the names.