Language Independent Tags In The Giella Infra
There are a number of classes of tags where the classes are language
- Error tags
- tags describing parts of the language outside the established norm
- Dialect tags
- tags describing variation (in the written language) based on
- Derivation tags
- tags describing derivational morphology
All such classes of tags are described below. New classes will probably be added
Each class is recognised by having a tag prefix, a short string starting
It is assumed — and required — that all tags described here (and all other tags,
Error tags
The error tag class is defined as follows:
- Tag prefix
-
+Err/
- Definition
- tags describing parts of the language outside the established norm
- FST implication
- all strings containing one or more such tags are removed from
Dialect tags
The dialect tag class is defined as follows:
- Tag prefix
-
+Dial/
- Definition
- tags describing (written) variation based on dialect
- FST implication
- when the DIALECTS variable is set in configure.ac, one
Other notes:
- The first character after the / must be one of + or –,
- The string following / and +/– must be one of the strings
Area tags
The area/country tag class is defined as follows:
- Tag prefix
-
+Area/
- Definition
- tags describing (written) variation based on country or another
- FST implication
- not yet actively used, but will be used to build proofing
: smaller
Other notes:
- The tag prefix must be followed by an
Semantic tags
The semantic tag class is defined as follows:
- Tag prefix
-
+Sem/
- Definition
- tags describing semantic properties of the lexeme
- FST implication
- all semantic tags are automatically identified, and a couple
Other notes:
-
the raw fst: the semantic tags are moved relative to the POS tag, to
-
all fst's except disambiguators and grammar checkers:
-
disambiguators and grammar checkers: the tags are kept (i.e. they are
Derivation tags
The derivation tag class is defined as follows:
- Tag prefix
-
+Der/
- Definition
- tags describing derivational morphology
- FST implication
- there is no language-independent processing of these tags ATM
Originating language tags
The originating language tag class is defined as follows:
- Tag prefix
-
+OLang/
- Definition
- tags describing originating language for loan words in cases where
- FST implication
- there is no language-independent processing of these tags ATM,
Other notes:
So far the only speech synthesis system we have built is for North Sámi. It was