root-morphology

Inari Sámi morphological analyser

Multichar_Symbols definitions

Parts of speech

  • +N +A +Adv +V
  • +Pron +CS +CC
  • +Adp +Po +Pr
  • +Interj +Pcle
  • +Num +ABBR
  • +Symbol = independent symbols in the text stream, like £, €, ©

Tags for sub-POS

  • +Prop - Propernoun
  • +Pers - Personal Pronoun
  • +Dem - Demonstrative Pronoun
  • +Interr - Interrogative Pronoun
  • +Refl - Reflexive Pronoun
  • +Recipr - Reciprocal Pronoun
  • +Rel - Relative Pronoun
  • +Indef - Indefinitive Pronoun
  • +Coll - Collective numerals, subtag for +N
  • +Arab - Arabic numeral, subtag for +Num
  • +Rom - Roman numeral, subtag for +Num
  • +ACR - Acronym
  • +Dyn - Dynamic Acronym

Grammatical properties

  • +IV +TV

Person - number

  • +Sg +Pl +Du
  • +Sg1 +Sg2 +Sg3
  • +Du1 +Du2 +Du3
  • +Pl1 +Pl2 +Pl3
  • +PxSg1 +PxSg2 +PxSg3
  • +PxDu1 +PxDu2 +PxDu3
  • +PxPl1 +PxPl2 +PxPl3

Case

  • +Nom +Gen +Acc
  • +Ill +Ine +Ela
  • +Com +Ess +Par +Abe
  • +Loc
  • +Known mon , till we found a better tag

Adjectival forms

  • +Comp +Superl
  • +Attr

Adverb types

  • +Spat Spatial adverbs
  • +Temp Temporal adverbs

Tense - mood

  • +Ind +Pot +Cond +Imprt +ImprtII
  • +Prs +Prt

Indefinite verb forms

  • +Pass +Sup
  • +Inf +Ger +GerII
  • +ConNeg +Neg
  • +PrsPrc +PrfPrc
  • +VGen +VAbess
  • +Actio


All non-positional derivations should be preceded by this tag, to make it possible to target regular expressions at all derivations in a language-independent way: just specify +Der|+Der1 .. +Der5 and you are set.

  • +Der

Derivations

Other/unclassified derivations, can appear in all positions:

  • +Der/ag neeljičievâg neeljijienâg kuulmâloonjâg neeljičievâg neeljijienâg
  • +Der/ahasas 85-ahasâš škovlâahasâš
  • +Der/ivvaas
  • +Der/vualasas tutkâmvuálásâš

Clitics

  • +Foc
  • +Foc/ba
  • +Foc/baa
  • +Foc/baan
  • +Foc/ban
  • +Foc/gan
  • +Foc/gas
  • +Foc/ge
  • +Foc/ges
  • +Foc/gin
  • +Foc/gis
  • +Foc/go
  • +Foc/han
  • +Foc/kas
  • +Foc/kin
  • +Foc/kis
  • +Foc/nii
  • +Foc/pa
  • +Foc/sun
  • +Foc/uv

Usage tags

  • +Err/Orth substandard, not in normative fst
  • +Err/Lex substandard, not in normative fst, no normative lemma
  • +Err/Hyph substandard, not in normative fst
  • +Err/SpaceCmp substandard, not in normative fst
  • +MWE - MultiWord Expression, used for abbreviation extraction for preprocess.sh
  • +Use/-PLX - do not include in Polderland spellers (most likely irrelevant for smn)
  • +Use/-Spell - do not include in speller (even though the entry is formally correct)
  • +Use/SpellNoSugg - Recognized, but not suggested in speller

Semantic tags

  • +Sem/Domain_Hum
  • +Sem/Edu_Hum
  • +Sem/Atr
  • +Sem/Mal
  • +Sem/Fem
  • +Sem/Sur
  • +Sem/Plc
  • +Sem/Org
  • +Sem/Obj
  • +Sem/Obj-el
  • +Sem/Measr
  • +Sem/Money
  • +Sem/Veh
  • +Sem/Year
  • +Sem/Act
  • +Sem/Act_Fruit
  • +Sem/Act_Plc
  • +Sem/Act_Route
  • +Sem/Act_Tool-it
  • +Sem/Amount
  • +Sem/Amount_Semcon
  • +Sem/Ani
  • +Sem/Ani-bird Bird names
  • +Sem/Ani-fish Fish names
  • +Sem/Ani_Body-abstr_Hum
  • +Sem/Ani_Buildpart
  • +Sem/Ani_Group
  • +Sem/Ani_Group_Hum
  • +Sem/Ani_Group_Prod-vis
  • +Sem/Ani_Hum
  • +Sem/Ani_Veh
  • +Sem/Aniprod
  • +Sem/Aniprod_Hum
  • +Sem/Aniprod_Obj-clo
  • +Sem/Aniprod_Perc-phys
  • +Sem/Aniprod_Plc_Route
  • +Sem/Body-abstr
  • +Sem/Body-abstr_Feat-psych
  • +Sem/Body-abstr_Prod-audio_Semcon
  • +Sem/Body_Food
  • +Sem/Body_Hum
  • +Sem/Body_Mat
  • +Sem/Body_Measr
  • +Sem/Body_Plc
  • +Sem/Body_Plc-elevate
  • +Sem/Build
  • +Sem/Build-room
  • +Sem/Buildpart
  • +Sem/Buildpart_Cat_Ctain_Mat
  • +Sem/Buildpart_Ctain_Obj
  • +Sem/Build_Edu_Org
  • +Sem/Build_Org
  • +Sem/Cat
  • +Sem/Clth
  • +Sem/Clth-jewl
  • +Sem/Clth-jewl_Curr
  • +Sem/Clthpart
  • +Sem/Ctain
  • +Sem/Ctain-abstr
  • +Sem/Ctain-clth
  • +Sem/Ctain-clth_Veh
  • +Sem/Ctain_Furn
  • +Sem/Ctain_Tool
  • +Sem/Curr
  • +Sem/Dance
  • +Sem/Date
  • +Sem/Dir
  • +Sem/Domain
  • +Sem/Domain_Prod-audio
  • +Sem/Drink
  • +Sem/Drink_Plant
  • +Sem/Dummytag
  • +Sem/Edu
  • +Sem/Edu_Event
  • +Sem/Edu_Geom
  • +Sem/Edu_Mat
  • +Sem/Edu_Org
  • +Sem/Event
  • +Sem/Event_Food
  • +Sem/Event_Plc
  • +Sem/Event_Time
  • +Sem/Feat
  • +Sem/Feat-measr
  • +Sem/Feat-measr_Plc
  • +Sem/Feat-phys
  • +Sem/Feat-phys_Tool-write
  • +Sem/Feat-phys_Veh
  • +Sem/Feat-phys_Wthr
  • +Sem/Feat-psych
  • +Sem/Feat-psych_Plc
  • +Sem/Feat_Plant
  • +Sem/Food
  • +Sem/Food-med
  • +Sem/Food_Plant
  • +Sem/Fruit
  • +Sem/Fruit_Hum
  • +Sem/Plant-fungus Fungi names
  • +Sem/Furn
  • +Sem/Game_Obj-play
  • +Sem/Geom
  • +Sem/Geom_Obj
  • +Sem/Group_Hum
  • +Sem/Group_Hum_Org
  • +Sem/Group_Hum_Plc
  • +Sem/Group_Txt
  • +Sem/Hum
  • +Sem/Hum_Lang
  • +Sem/Hum_Lang_Plc
  • +Sem/Hum_Mat_Tool
  • +Sem/Hum_Obj
  • +Sem/Hum_Org
  • +Sem/Hum_Plc
  • +Sem/Hum_Veh
  • +Sem/ID
  • +Sem/Ideol
  • +Sem/Lang Languages
  • +Sem/Lang_Tool
  • +Sem/Mat
  • +Sem/Mat_Plant
  • +Sem/Mat_Txt
  • +Sem/Measr_Sign
  • +Sem/Measr_Time
  • +Sem/Money_Obj
  • +Sem/Money_Txt
  • +Sem/Obj-clo
  • +Sem/Obj-ling
  • +Sem/Obj-play
  • +Sem/Obj-rope
  • +Sem/Obj-surfc
  • +Sem/Obj_Semcon
  • +Sem/Obj_State
  • +Sem/Obj_Veh
  • +Sem/Org_Prod-cogn
  • +Sem/Org_Rule
  • +Sem/Org_Txt
  • +Sem/Org_Prod-audio
  • +Sem/Org_Prod-vis
  • +Sem/Part
  • +Sem/Perc-cogn
  • +Sem/Perc-emo
  • +Sem/Perc-phys
  • +Sem/Plant ! Plant names
  • +Sem/Plantpart
  • +Sem/Plant_Plantpart
  • +Sem/Plc-abstr
  • +Sem/Plc-abstr_Rel_State
  • +Sem/Plc-abstr_Route
  • +Sem/Plc-elevate
  • +Sem/Plc-line
  • +Sem/Plc-water
  • +Sem/Plc_Route
  • +Sem/Plc_Substnc
  • +Sem/Plc_Substnc_Wthr
  • +Sem/Plc_Time
  • +Sem/Plc_Tool-catch
  • +Sem/Plc_Txt
  • +Sem/Plc_Wthr
  • +Sem/Pos
  • +Sem/Process
  • +Sem/Prod
  • +Sem/Prod-audio
  • +Sem/Prod-audio_Txt
  • +Sem/Prod-cogn
  • +Sem/Prod-cogn_Txt
  • +Sem/Prod-ling
  • +Sem/Prod-vis
  • +Sem/Rel
  • +Sem/Route
  • +Sem/Rule
  • +Sem/Semcon
  • +Sem/Semcon_Txt
  • +Sem/Sign
  • +Sem/State
  • +Sem/State-sick
  • +Sem/Substnc
  • +Sem/Substnc_Wthr
  • +Sem/Time
  • +Sem/Time-clock
  • +Sem/Time_Wthr
  • +Sem/Tool
  • +Sem/Tool-catch
  • +Sem/Tool-clean
  • +Sem/Tool-it
  • +Sem/Tool-measr
  • +Sem/Tool-music
  • +Sem/Tool-write
  • +Sem/Txt
  • +Sem/Wpn
  • +Sem/Wthr

Punctuation

  • +CLB +PUNCT +HYPH
  • +PAR +LEFT +RIGHT
  • +URL

Morphophonemes

  • k4 l4 t4 p4 c4 t4 č4 = these are consonants that change in cg
  • b6 d6 g6 = these are consonants that change in clitics: jiemge, epke
  • '7
  • i4 i6 = this is the postvocalic i consonant, realised as i
  • i6 j6 = these are fake vowel and consonant, to get rules to function for exeptions
  • i5 = comitative suffix-begin in loanwords
  • a5 ä5 á5 e5 u5 o5 these vowels do not change
  • h5 j5 m5 ŋ5 t5 c5 d5 l5 t5 r5 č5 k5 z5 these consonants do not change in WG
  • y5 these vowels do not change, e.g. pyerá
  • i2 u2 i3 â2 stemvowel changing to e, e.g. kyeli: kyeˊle
  • ∑ used for dynamic compounds, Capital Greek Sigma, Alt-Shift-S

Archiphonemes

  • ^RC Root consonant dummy
  • ^RV Root vowel dummy
  • ^SC Suffix consonant dummy
  • ^SV Suffix vowel dummy
  • ^VO = vowel copy

Triggers

  • ^CLEN Consonant lengthening in qual WG
  • ^CSH Consonant shortening (not WG)
  • ^FCD Final consonant deletion
  • ^FVD Final vowel deletion
  • ^EA is á and root vowel change in Ill Sg of i-stems
  • ^EX = Stem vowel: i to â where it should have been á, this is Err/Orth only
  • ^RLEN Root vowel lengthening (impl. WG)
  • ^RVSH Root vow shortening
  • ^SLEN Suffix vowel lengthening
  • ^SVLOW Suffix vowel lowering â > á and u > o
  • ^SVSH Second syllable vowel shortening
  • ^VLOW is Vowel lowering in 3rd sg of contract verbs tuhhid: tohhe
  • ^WG Weak grade trigger
  • ^ÁE á->e
  • ^ÁI á->i
  • ^VHIGH = hightening of vowels for verbs o to uu, a to oo
  • ^VBACK = back vowels for verbs, ä to a (when needed, normally 2syll a|â is enough
  • ^IUML = â to e in front of high suffixes
  • ^BLOCK = This symbol just to block otherwise triggering contexts

Symbols that need to be escaped on the lower side (towards twolc):

  • +Use/NG not-generate, for ped generation isme-ped.fst
  • +Use/MT generate only for MT
  • +Use/Circ
  • +Use/-PMatch
  • +Use/PMatch
  • @P.Pmatch.Backtrack@

Variants

  • +v1
  • +v2
  • +v3
  • +v4
  • +Hom1
  • +Hom2
  • +Allegro

Semantic tags

  • +Sem/Body denotes bodyparts
  • +Sem/Plc denotes places

Compound tags

These tags describe the parts of the compound.

The prefix (before "/") is Cmp.

  • +Cmp compounds
  • +Cmp/Hyph compounds
  • +Cmp/SgNom compounds
  • +Cmp/PlNom compounds
  • +Cmp/Attr compounds
  • +Cmp/SgGen compounds
  • +Cmp/PlGen compounds
  • +Cmp/SplitR compounds
  • +Cmp/Sh compounds

These tags govern the parts of the compound

The prefix (before "/") is CmpNP: (meaning: this is the normative position of thus tag)

  • +CmpNP/All - ... in all positions, default, this tag does not have to be written
  • +CmpNP/First - ... only be first part in a compound or alone
  • +CmpNP/Pref - ... only first part in a compound, NEVER alone
  • +CmpNP/Last - ... only be last part in a compound or alone
  • +CmpNP/Suff - ... only last part in a compound, NEVER alone
  • +CmpNP/None - ... does not take part in compounds
  • +CmpNP/Only - ... only be part of a compound, i.e. can never be used alone, but can appear in any position

The prefix (before "/") is CmpN: (meaning: this is the normative position of thus tag) The tagged part of the compound should make a compound using:

  • +CmpN/SgN Singular Nominative
  • +CmpN/SgG Singular Genitive
  • +CmpN/PlG Plural Genitive

Unmarked = Default, ie +CmpN/SgN for SMN.

The second part of the compound may require that the previous (left part) is:

  • +CmpN/SgNomLeft Singular Nominative
  • +CmpN/SgGenLeft Singular Genitive
  • +CmpN/PlGenLeft Plural Genitive

Language tagged names

  • +OLang/ENG
  • +OLang/FIN
  • +OLang/NNO
  • +OLang/NOB
  • +OLang/SME
  • +OLang/SMA
  • +OLang/SWE
  • +OLang/UND
  • +OLang/RUS

Flag diacritics

We have manually optimised the structure of our lexicon using following flag diacritics to restrict morhpological combinatorics - only allow compounds with verbs if the verb is further derived into a noun again:

@P.NeedNoun.ON@ (Dis)allow compounds with verbs unless nominalised
@D.NeedNoun.ON@ (Dis)allow compounds with verbs unless nominalised
@C.NeedNoun@ (Dis)allow compounds with verbs unless nominalised
@R.NeedNoun.ON@ (Dis)allow compounds with verbs unless nominalised

For languages that allow compounding, the following flag diacritics are needed to control position-based compounding restrictions for nominals. Their use is handled automatically if combined with +CmpN/xxx tags. If not used, they will do no harm.

@P.CmpFrst.FALSE@ Require that words tagged as such only appear first
@D.CmpPref.TRUE@ Block such words from entering ENDLEX
@P.CmpPref.FALSE@ Block these words from making further compounds
@D.CmpLast.TRUE@ Block such words from entering R
@D.CmpNone.TRUE@ Combines with the next tag to prohibit compounding
@U.CmpNone.FALSE@ Combines with the prev tag to prohibit compounding
@U.CmpNone.TRUE@ Combines with the two previous ones to block compounding
@P.CmpOnly.TRUE@ Sets a flag to indicate that the word has passed R
@D.CmpOnly.FALSE@ Disallow words coming directly from root.
@D.CmpHyph.TRUE@ Flag to control hyphenated compounds like proper nouns
@U.CmpHyph.FALSE@ Flag to control hyphenated compounds like proper nouns
@U.CmpHyph.TRUE@ Flag to control hyphenated compounds like proper nouns
@C.CmpHyph@ Flag to control hyphenated compounds like proper nouns
@P.CmpHyph.TRUE@ Flag to control hyphenated compounds like proper nouns
@N.CmpHyph.TRUE@ Flag to control hyphenated compounds like proper nouns

Use the following flag diacritics to control downcasing of derived proper nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use these flags. There exists a ready-made regex that will do the actual down-casing given the proper use of these flags.

@U.Cap.Obl@ Allowing downcasing of derived names: deatnulasj.
@U.Cap.Opt@ Allowing downcasing of derived names: deatnulasj.
  • @U.NeedsVowRed.OFF@ is used to force hyphenation/non-reduction: samediggi-
  • @U.NeedsVowRed.ON@ is used to force reduction w/o hyphen: samedigge#xxx
  • @C.NeedsVowRed@ Clearing this feature, so that it doesn't interfere with further compounding
  • @P.Px.add@
  • @R.Px.add@
  • @P.Px.block@
  • @D.Px.block@
  • @P.Nom3Px.add@
  • @R.Nom3Px.add@
  • @P.Vgen.add@
  • @R.Vgen.add@
  • @R.SpellRlx.ON@ Flag used to tag spell-relax-analysed strings (and only those).
  • @D.SpellRlx.ON@ Flag used to tag spell-relax-analysed strings (and only those).
  • @C.SpellRlx@ Flag used to tag spell-relax-analysed strings (and only those).
  • @R.SpaceCmp.ON@ Flag to tag compounds written with a space
  • @D.SpaceCmp.ON@ Flag to tag compounds written with a space
  • @C.SpaceCmp@ Flag to tag compounds written with a space

Basic lexica, pointing to the other lexicon files

LEXICON Root where everyting starts

  • LEXICON Acronym splitting in common and smn
  • LEXICON ProperNoun

Lexicon ENDLEX

And this is the ENDLEX of everything:

 @D.CmpOnly.FALSE@@D.CmpPref.TRUE@@D.NeedNoun.ON@ # ;

The @D.CmpOnly.FALSE@ flag diacritic is ued to disallow words tagged with +CmpNP/Only to end here. The @D.NeedNoun.ON@ flag diacritic is used to block illegal compounds.