numerals-affixes
-
LEXICON ARABICCASES adds +Arab
-
LEXICON ARABICCASE adds +Arab
-
LEXICON ARABICCASE0 adds +Arab
-
LEXICON DIGITCASES to distinguish between 0 and oblique
of ordinals. Strings like 10. are inherently ambiguous — they can eitiher
be a regular cardinal followed by an end-of-sentence full stop, or they can
be an ordinal, potentially in the middle of a sentence. Regular fst's know
nothing about this double nature, so we just give the default ordinal
analysis. But for pmatch-based lookup and tokenisation, we try to find all
the alternatives. The lexicon contains just the following two lines:
The idea is that input like 10. can then be analysed both as the ordinal
10., and as the sequence cardinal 10 + the sentence ending full stop
.. The lexc entry above will only give the ordinal analysis of 10.,
but then tell the fst runtime to go back and try to find alternatives for the
same input, in which case it will find that 10 + . matches the same
input. Both tokenisations will then be printed by hfst-tokenise --giella,
so that further processing can be done to choose the correct one in a given
context. The location of the symbol @P.Pmatch.Backtrack@ determines where
the split is being done, and thus which parts could potentially get other
analyses.