Different FSTs and how they interact with tags

Overview

This is an overview over FSTs which are compiled and their properties.

  • Removing paths: Paths marked with +Dial/- (exept for the chosen dialect), +Use/NG and +v2 and bigger, are removed for MT (generator-mt-gt-norm) and Oahpa: (generator-oahpa-gt-restr). The tags are invisible in other FSTs. Paths with +Err/Orth and +Err/Lex are removed for all alle norm-analysers and norm-generatorers, and the tags are visible in other FSTs. +Use/MT should be removed for all FSTs except for MT, but this is not implemented. The tag is invisible.
  • +MWE is added to entries which should be tokenised as multiwords exspressions. The tag is invisible.
  • Normative compoundtags like, +CmpN/SgN +CmpNP/First +CmpN/SgNomLeft and more, are resticting compounding only for analyser-gt-norm.hfst and generator-gt-norm.hfst. They don't work for XFST. The tags are invisible. Read more about Normative compoundtags.
  • Other tags, like +TV, +IV,+vN, +HomN, +NomAg, +G3, +G7, +Allegro and more, are visible or invisible in analysers and generators according to the different applications, because we want to keep variants from each other, or to give information about semantics, grammatic features or pronounciation, or to give ID to lemmas which lexical forms are homonyms. For analyser-gt-norm and analyser-gt-desc there will be paths both with and without these tags.
  • NDS-dictionary and VD-dicts: Analysis is done with analyser-dict-gt-desc, and wordform generation with generator-dict-gt-norm. In the generator are removed all +v2 and bigger. For the dictionary we need tags lik +HomN, +G3, +G7, +NomAg to be visible to give the correct translation to the homonous lexiconentries, and we will have tags like +Use/NGminip and +Allegro visible so we can choose not to give them to the user in the paradigms: dictionarywork
  • +Sem/tags are visible in analyser-disamb-gt-desc.
  • For disambiguation the analysis is done with analyser-disamb-gt-desc, because we need semantic tags, og also tags like +HomN, +G3, +G7, +NomAg, +Allegro. Also +Err/Orth and +Err/Lex can be useful for disambiguation.
  • For Oahpa is gerated with generator-oahpa-gt-restr for each dialect, and generator-oahpa-gt-norm for correct-forms, and for these generators are also tags like med tagger som +TV, +IV,+vN, +HomN, +NomAg, +G3, +G7, +Allegro to differ between the lemmas. For sme-Oahpa is ped-sme.fst compiled in a error-gt-branchen, and it is used for analysis of user's input in Sahka og Vasta. This one is not documented here.
  • On web we give morphological analyses wihtout exstra internal tags, but we have +Err/Orth and +Err/Lex visible. Dublets of analysis, the analysis with +Err/-tag is removed, with a vislcg3-files.

Analysers and generators

+MWE +Dial/- and normative compoundtags like +CmpN/SgN are not visible in any FST

ANALYSERS
FST Visible Invisible Remarks
analyser-raw-gt-desc all tags
analyser-gt-desc +Err/Orth, +Err/Lex, +vN +Sem/tags Descriptive analyser
analyser-disamb-gt-desc +Err/Orth, +Err/Lex, +Sem/tags, +Allegro +vN Descriptive analyser for disambiguation
analyser-gt-norm +NomAg +G3 +G7 +Allegro, +HomN +vN +Use/NGminip +Coll +TV +IV +Sem/tags, +OLang/*, +Use/NGminip, +Allegro, +vN Normative analyser
analyser-dict-gt-desc +NomAg +G3 +G7 +Allegro, +HomN +vN +Use/NGminip +Coll +TV +IV +Sem/tags, +OLang/*, +vN Descriptive analyser for dicts
analyser-dict-gt-desc-mobile +NomAg +G3 +G7 +Allegro, +HomN +vN +Use/NGminip +Coll +TV +IV +Sem/tags, +OLang/*, +vN analyser-dict-gt-desc-mobile is compiled with orthography/spellrelax-mobile-keyboard.regex in addition to the ordinary spellrelax.regex
analyser-oahpa-gt-norm +NomAg +G3 +G7 +Allegro +HomN +Coll +TV +IV +Sem/tags, +OLang/*, +Use/NGminip, +Allegro, +vN FST made for testing, it is parallel with the generator.
analyser-mt-gt-desc +NomAg +G3 +G7 +Allegro +HomN +Coll +TV +IV +Sem/tags +vN Desciptive analyser for MT input. Also analysis without +Sem/tags and +Allegro
GENERATORS
FST Mandatory Optional Remarks
generator-gt-desc - +NomAg +G3 +G7 +Allegro +HomN +vN +Use/NGminip +Coll +TV +IV Descriptive generator
generator-gt-norm - +NomAg +G3 +G7 +Allegro +HomN +vN +Use/NGminip +Coll +TV +IV Normative generator
generator-dict-gt-norm +NomAg +G3 +G7 +Allegro +HomN +vN +Use/NGminip +Coll +TV +IV Normative generator for dict paradigms
generator-oahpa-gt-norm +NomAg +G3 +G7 +HomN +Coll +IV +TV +vN, +Allegro +Use/NGminip Normative generator for generating Oahpa keyanswers (tolerate all dialects and Use/NG-forms)
generator-oahpa-gt-restr_XX = generator-oahpa-gt-norm + Allegro Normative generator for generating tasks. Paths marked with +Dial/- (exept for the chosen dialect), +Use/NG and +v2 and biggerRemoves paths with +Use/NG and the other dialects (Dial/), if there are more dialects
generator-mt-apertium-norm.att.gz +IV +TV +NomAg +G3 +G7 +Coll +HomN +Sem/tags +Allegro Normative generator for MT output. Paths with +vN and +Use/NG are removed.

How to compile in langs/LANG

./configure

Examples for parameters: ./configure --with-hfst --enable-dicts

How to get a list of parameters: ./configure -h

How to see the parameters which are set: head config.log

Some of the parameters:

  • --enable-spellers (build any/all spellers [default=no])
  • --enable-grammarchecker (enable grammar checker [default=no])
  • --enable-dicts (enable dictionary transducers [default=no])
  • --enable-oahpa (enable all tranducers with adjective-oahpa.lexc file instead of adjective.lexc [default=no])
  • --enable-apertium (enable apertium transducers [default=no])