root-morphology
Inari Sámi morphological analyser
Multichar_Symbols definitions
Parts of speech
- +N +A +Adv +V
- +Pron +CS +CC
- +Adp +Po +Pr
- +Interj +Pcle
- +Num +ABBR
- +Symbol = independent symbols in the text stream, like £, €, ©
Tags for sub-POS
-
+Prop - Propernoun
-
+Pers - Personal Pronoun
-
+Dem - Demonstrative Pronoun
-
+Interr - Interrogative Pronoun
-
+Refl - Reflexive Pronoun
-
+Recipr - Reciprocal Pronoun
-
+Rel - Relative Pronoun
-
+Indef - Indefinitive Pronoun
-
+Coll - Collective numerals, subtag for +N
-
+Arab - Arabic numeral, subtag for +Num
-
+Rom - Roman numeral, subtag for +Num
-
+ACR - Acronym
- +Dyn - Dynamic Acronym
Grammatical properties
- +IV +TV
Person - number
- +Sg +Pl +Du
- +Sg1 +Sg2 +Sg3
- +Du1 +Du2 +Du3
- +Pl1 +Pl2 +Pl3
- +PxSg1 +PxSg2 +PxSg3
- +PxDu1 +PxDu2 +PxDu3
- +PxPl1 +PxPl2 +PxPl3
Case
- +Nom +Gen +Acc
- +Ill +Ine +Ela
- +Com +Ess +Par +Abe
- +Loc
- +Known mon , till we found a better tag
Adjectival forms
- +Comp +Superl
- +Attr
Adverb types
- +Spat Spatial adverbs
- +Temp Temporal adverbs
Tense - mood
- +Ind +Pot +Cond +Imprt +ImprtII
- +Prs +Prt
Indefinite verb forms
- +Pass +Sup
- +Inf +Ger +GerII
- +ConNeg +Neg
- +PrsPrc +PrfPrc
- +VGen +VAbess
- +Actio
All non-positional derivations should be preceded by this tag, to make it possible
- +Der
Derivations
Other/unclassified derivations, can appear in all positions:
- +Der/ag neeljičievâg neeljijienâg kuulmâloonjâg neeljičievâg neeljijienâg
- +Der/ahasas 85-ahasâš škovlâahasâš
- +Der/ivvaas
- +Der/vualasas tutkâmvuálásâš
Clitics
- +Foc
- +Foc/ba
- +Foc/baa
- +Foc/baan
- +Foc/ban
- +Foc/gan
- +Foc/gas
- +Foc/ge
- +Foc/ges
- +Foc/gin
- +Foc/gis
- +Foc/go
- +Foc/han
- +Foc/kas
- +Foc/kin
- +Foc/kis
- +Foc/nii
- +Foc/pa
- +Foc/sun
- +Foc/uv
Usage tags
-
+Err/Orth substandard, not in normative fst
-
+Err/Lex substandard, not in normative fst, no normative lemma
-
+Err/Hyph substandard, not in normative fst
-
+Err/SpaceCmp substandard, not in normative fst
-
+MWE - MultiWord Expression, used for abbreviation extraction for preprocess.sh
-
+Use/-PLX - do not include in Polderland spellers (most likely irrelevant for smn)
-
+Use/-Spell - do not include in speller (even though the entry is formally correct)
- +Use/SpellNoSugg - Recognized, but not suggested in speller
Semantic tags
- +Sem/Domain_Hum
- +Sem/Edu_Hum
- +Sem/Atr
- +Sem/Mal
- +Sem/Fem
- +Sem/Sur
- +Sem/Plc
- +Sem/Org
- +Sem/Obj
- +Sem/Obj-el
- +Sem/Measr
- +Sem/Money
- +Sem/Veh
- +Sem/Year
- +Sem/Act
- +Sem/Act_Fruit
- +Sem/Act_Plc
- +Sem/Act_Route
- +Sem/Act_Tool-it
- +Sem/Amount
- +Sem/Amount_Semcon
- +Sem/Ani
- +Sem/Ani-bird Bird names
- +Sem/Ani-fish Fish names
- +Sem/Ani-insect Insect names
- +Sem/Ani_Body-abstr_Hum
- +Sem/Ani_Buildpart
- +Sem/Ani_Group
- +Sem/Ani_Group_Hum
- +Sem/Ani_Group_Prod-vis
- +Sem/Ani_Hum
- +Sem/Ani_Veh
- +Sem/Aniprod
- +Sem/Aniprod_Hum
- +Sem/Aniprod_Obj-clo
- +Sem/Aniprod_Perc-phys
- +Sem/Aniprod_Plc_Route
- +Sem/Body-abstr
- +Sem/Body-abstr_Feat-psych
- +Sem/Body-abstr_Prod-audio_Semcon
- +Sem/Body_Food
- +Sem/Body_Hum
- +Sem/Body_Mat
- +Sem/Body_Measr
- +Sem/Body_Plc
- +Sem/Body_Plc-elevate
- +Sem/Build
- +Sem/Build-room
- +Sem/Buildpart
- +Sem/Buildpart_Cat_Ctain_Mat
- +Sem/Buildpart_Ctain_Obj
- +Sem/Build_Edu_Org
- +Sem/Build_Org
- +Sem/Cat
- +Sem/Clth clothes
- +Sem/Clth-jewl
- +Sem/Clth-jewl_Curr
- +Sem/Clthpart
- +Sem/Ctain
- +Sem/Ctain-abstr
- +Sem/Ctain-clth
- +Sem/Ctain-clth_Veh
- +Sem/Ctain_Furn
- +Sem/Ctain_Tool
- +Sem/Curr
- +Sem/Dance
- +Sem/Date
- +Sem/Dir
- +Sem/Domain
- +Sem/Domain_Prod-audio
- +Sem/Drink
- +Sem/Drink_Plant
- +Sem/Dummytag
- +Sem/Edu
- +Sem/Edu_Event
- +Sem/Edu_Geom
- +Sem/Edu_Mat
- +Sem/Edu_Org
- +Sem/Event
- +Sem/Event_Food
- +Sem/Event_Plc
- +Sem/Event_Time
- +Sem/Feat
- +Sem/Feat-measr
- +Sem/Feat-measr_Plc
- +Sem/Feat-phys
- +Sem/Feat-phys_Tool-write
- +Sem/Feat-phys_Veh
- +Sem/Feat-phys_Wthr
- +Sem/Feat-psych
- +Sem/Feat-psych_Plc
- +Sem/Feat_Plant
- +Sem/Food
- +Sem/Food-med
- +Sem/Food_Plant
- +Sem/Fruit
- +Sem/Fruit_Hum
- +Sem/Plant-fungus Fungi names
- +Sem/Furn
- +Sem/Game_Obj-play
- +Sem/Geom
- +Sem/Geom_Obj
- +Sem/Group_Hum
- +Sem/Group_Hum_Org
- +Sem/Group_Hum_Plc
- +Sem/Group_Txt
- +Sem/Hum
- +Sem/Hum_Lang
- +Sem/Hum_Lang_Plc
- +Sem/Hum_Mat_Tool
- +Sem/Hum_Obj
- +Sem/Hum_Org
- +Sem/Hum_Plc
- +Sem/Hum_Veh
- +Sem/ID
- +Sem/Ideol
- +Sem/Lang Languages
- +Sem/Lang_Tool
- +Sem/Mat
- +Sem/Mat_Plant
- +Sem/Mat_Txt
- +Sem/Measr_Sign
- +Sem/Measr_Time
- +Sem/Money_Obj
- +Sem/Money_Txt
- +Sem/Obj-clo
- +Sem/Obj-ling
- +Sem/Obj-play
- +Sem/Obj-rope
- +Sem/Obj-surfc
- +Sem/Obj_Semcon
- +Sem/Obj_State
- +Sem/Obj_Veh
- +Sem/Org_Prod-cogn
- +Sem/Org_Rule
- +Sem/Org_Txt
- +Sem/Org_Prod-audio
- +Sem/Org_Prod-vis
- +Sem/Part
- +Sem/Perc-cogn
- +Sem/Perc-emo
- +Sem/Perc-phys
- +Sem/Plant ! Plant names
- +Sem/Plantpart
- +Sem/Plant_Plantpart
- +Sem/Plc-abstr
- +Sem/Plc-abstr_Rel_State
- +Sem/Plc-abstr_Route
- +Sem/Plc-elevate
- +Sem/Plc-line
- +Sem/Plc-water
- +Sem/Plc_Route
- +Sem/Plc_Substnc
- +Sem/Plc_Substnc_Wthr
- +Sem/Plc_Time
- +Sem/Plc_Tool-catch
- +Sem/Plc_Txt
- +Sem/Plc_Wthr
- +Sem/Pos
- +Sem/Process
- +Sem/Prod
- +Sem/Prod-audio
- +Sem/Prod-audio_Txt
- +Sem/Prod-cogn
- +Sem/Prod-cogn_Txt
- +Sem/Prod-ling
- +Sem/Prod-vis
- +Sem/Rel
- +Sem/Route
- +Sem/Rule
- +Sem/Semcon
- +Sem/Semcon_Txt
- +Sem/Sign
- +Sem/State
- +Sem/State-sick
- +Sem/Substnc
- +Sem/Substnc_Wthr
- +Sem/Time
- +Sem/Time-clock
- +Sem/Time_Wthr
- +Sem/Tool
- +Sem/Tool-catch
- +Sem/Tool-clean
- +Sem/Tool-it
- +Sem/Tool-measr
- +Sem/Tool-music
- +Sem/Tool-write
- +Sem/Txt
- +Sem/Wpn
- +Sem/Wthr weather
Punctuation
- +CLB +PUNCT +HYPH
- +PAR +LEFT +RIGHT
- +CLBfinal Sentence final abbreviated expression ending in full stop, so that the full stop is ambiguous
- +URL
Morphophonemes
- k4 l4 t4 p4 c4 t4 č4 = these are consonants that change in cg
- b6 d6 g6 = these are consonants that change in clitics: jiemge, epke
- i4 i6 = this is the postvocalic i consonant, realised as i
- i6 j6 = these are fake vowel and consonant, to get rules to function for exeptions
- i5 = comitative suffix-begin in loanwords
- a5 ä5 á5 e5 u5 o5 these vowels do not change
- h5 j5 m5 ŋ5 t5 c5 d5 l5 t5 r5 č5 k5 z5 these consonants do not change in WG
- y5 these vowels do not change, e.g. pyerá
- i2 u2 i3 â2 stemvowel changing to e, e.g. kyeli: kyeˊle
- ∑ used for dynamic compounds, Capital Greek Sigma, Alt-Shift-S
Archiphonemes
- ^RC Root consonant dummy
- ^RV Root vowel dummy
- ^SC Suffix consonant dummy
- ^SV Suffix vowel dummy
- ^VO = vowel copy
Triggers
- ^CLEN Consonant lengthening in qual WG
- ^CSH Consonant shortening (not WG)
- ^FCD Final consonant deletion
- ^FVD Final vowel deletion
- ^EA is á and root vowel change in Ill Sg of i-stems
- ^EX = Stem vowel: i to â where it should have been á, this is Err/Orth only
- ^RLEN Root vowel lengthening (impl. WG)
- ^RVSH Root vow shortening
- ^SLEN Suffix vowel lengthening
- ^SVLOW Suffix vowel lowering â > á and u > o
- ^SVSH Second syllable vowel shortening
- ^VLOW is Vowel lowering in 3rd sg of contract verbs tuhhid: tohhe
- ^WG Weak grade trigger
- ^ÁE á->e
- ^ÁI á->i
- ^VHIGH = hightening of vowels for verbs o to uu, a to oo
- ^VBACK = back vowels for verbs, ä to a (when needed, normally 2syll a|â is enough
- ^IUML = â to e in front of high suffixes
- ^BLOCK = This symbol just to block otherwise triggering contexts
Symbols that need to be escaped on the lower side (towards twolc):
- +Use/NG not-generate, for ped generation isme-ped.fst
- +Use/MT generate only for MT
- +Use/Circ
- +Use/-PMatch
- +Use/PMatch
- @P.Pmatch.Backtrack@
Variants
- +v1
- +v2
- +v3
- +v4
- +Hom1
- +Hom2
- +Allegro
Semantic tags
- +Sem/Body denotes bodyparts
- +Sem/Plc denotes places
Compound tags
These tags describe the parts of the compound.
The prefix (before "/") is Cmp.
- +Cmp compounds
- +Cmp/Hyph compounds
- +Cmp/SgNom compounds
- +Cmp/PlNom compounds
- +Cmp/Attr compounds
- +Cmp/SgGen compounds
- +Cmp/PlGen compounds
- +Cmp/SplitR compounds
- +Cmp/Sh compounds
These tags govern the parts of the compound
The prefix (before "/") is CmpNP:
-
+CmpNP/All - ... in all positions, default, this tag does not have to be written
-
+CmpNP/First - ... only be first part in a compound or alone
-
+CmpNP/Pref - ... only first part in a compound, NEVER alone
-
+CmpNP/Last - ... only be last part in a compound or alone
-
+CmpNP/Suff - ... only last part in a compound, NEVER alone
-
+CmpNP/None - ... does not take part in compounds
-
+CmpNP/Only - ... only be part of a compound, i.e. can never
The prefix (before "/") is CmpN:
-
+CmpN/SgN Singular Nominative
-
+CmpN/SgG Singular Genitive
- +CmpN/PlG Plural Genitive
Unmarked = Default, ie +CmpN/SgN for SMN.
The second part of the compound may require that the previous (left part) is:
-
+CmpN/SgNomLeft Singular Nominative
-
+CmpN/SgGenLeft Singular Genitive
- +CmpN/PlGenLeft Plural Genitive
Language tagged names
- +OLang/ENG
- +OLang/FIN
- +OLang/NNO
- +OLang/NOB
- +OLang/SME
- +OLang/SMA
- +OLang/SWE
- +OLang/UND
- +OLang/RUS
Flag diacritics
@P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised |
@D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised |
@C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised |
@R.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised |
@D.ErrOrth.ON@ |
@C.ErrOrth@ |
@P.ErrOrth.ON@ |
For languages that allow compounding, the following flag diacritics are needed
@P.CmpFrst.FALSE@ | Require that words tagged as such only appear first |
@D.CmpPref.TRUE@ | Block such words from entering ENDLEX |
@P.CmpPref.FALSE@ | Block these words from making further compounds |
@D.CmpLast.TRUE@ | Block such words from entering R |
@D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding |
@U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding |
@U.CmpNone.TRUE@ | Combines with the two previous ones to block compounding |
@P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R |
@D.CmpOnly.FALSE@ | Disallow words coming directly from root. |
@D.CmpHyph.TRUE@ | Flag to control hyphenated compounds like proper nouns |
@U.CmpHyph.FALSE@ | Flag to control hyphenated compounds like proper nouns |
@U.CmpHyph.TRUE@ | Flag to control hyphenated compounds like proper nouns |
@C.CmpHyph@ | Flag to control hyphenated compounds like proper nouns |
@P.CmpHyph.TRUE@ | Flag to control hyphenated compounds like proper nouns |
@N.CmpHyph.TRUE@ | Flag to control hyphenated compounds like proper nouns |
Use the following flag diacritics to control downcasing of derived proper
@U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj. |
@U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj. |
- @U.NeedsVowRed.OFF@ is used to force hyphenation/non-reduction: samediggi-
- @U.NeedsVowRed.ON@ is used to force reduction w/o hyphen: samedigge#xxx
- @C.NeedsVowRed@ Clearing this feature, so that it doesn't interfere with further compounding
- @C.Px@
- @C.Nom3Px@
- @P.Px.add@
- @R.Px.add@
- @P.Px.block@
- @D.Px.block@
- @P.Nom12Px.add@
- @R.Nom12Px.add@
- @P.Nom3Px.add@
- @R.Nom3Px.add@
- @P.Vgen.add@
- @R.Vgen.add@
- @R.SpellRlx.ON@ Flag used to tag spell-relax-analysed strings (and only those).
- @D.SpellRlx.ON@ Flag used to tag spell-relax-analysed strings (and only those).
- @C.SpellRlx@ Flag used to tag spell-relax-analysed strings (and only those).
- @R.SpaceCmp.ON@ Flag to tag compounds written with a space
- @D.SpaceCmp.ON@ Flag to tag compounds written with a space
- @C.SpaceCmp@ Flag to tag compounds written with a space
Basic lexica, pointing to the other lexicon files
LEXICON Root where everyting starts
- LEXICON Acronym splitting in common and smn
- LEXICON ProperNoun
Lexicon ENDLEX
@D.CmpOnly.FALSE@@D.CmpPref.TRUE@@D.NeedNoun.ON@ # ;
The @D.CmpOnly.FALSE@ flag diacritic is ued to disallow words tagged