Transducer Names In The New Infra
The new infrastructure produces transducers with a different naming scheme than in the old infra. The names are longer and more descriptive.
Regular transducers
The information we want to encode
The strings to represent the value for each class is given first. The order
basictype
- analyser
- generator
- transformer (transducer to transform one kind of representation to another,
- automaton (single-level acceptor)
application
- morphological analysis/generation (default, not specified)
- spellers (one application tag name for each):
- hfstspell (just a transducer/automaton)
- plxspell (needs fullform conversion)
- hunspell (needs fullform conversion)
- hfstspell (just a transducer/automaton)
- hyph - hyphenation
- oahpa - oahpa/error (more tags, broader coverage than standard)
- dict - dictionaries (more visible tags, same content as standard)
- mt - machine translation, typically for Apertium
tagset
- gt (default)
- apert - apertium
- GTLANG - tags in the target language
- XLANG - tags in some other language, typically a majority language
normativity
- desc - descriptive
- norm - normative
- restr - restricted to a dialect subset of norm (only use with dialect tag)
- single - only one word form pr analysis, used for e.g. MT generators
dialect
- whatever set of dialects are relevant for the language
orthography
- whatever set of orthographies are relevant for the language, e.g.
fsttype
- hfst (HFST fst)
- hfstol (HFST optimised lookup fst)
- xfst (Xerox fst)
Naming scheme
Based on the above list, we get this:
basictype-application-tagset-normativity[-dialect][-orthography].fsttype
This should give transducers with names like:
-
analyser-oahpa-gt-desc.hfst
-
generator-mt-apertium-single.hfst
- analyser-gt-desc.xfst
Transcription transducers
These are not lexical transducers, instead they convert between one
Possible application names are:
- date
- clock
- number
Conversion to IPA is a variant of this type of transducers, and is named:
- text2ipa
Since no tags are involved in these types of transducers,
The transcriptors are sometimes filtered (i.e. some forms are excluded in
File name examples
-
transcriptor-date-text2digit.filtered.lookup.xfst
-
transcriptor-clock-digit2text.filtered.lookup.xfst
- transcriptor-text2ipa-desc.hfst