root-morphology
Morphology
- Multichar Symbols declaration
- The Root lexicon
- A set of lexica for minor parts of speech
- A set of unfinished lexica, to be either deleted or expanded.
Declaration of Multichar_Symbols
Analysis symbols
The morphological analyses of the wordforms of Eastern Mari language are
- %^VoTrigger for use with acronyms after hyphen
- %^VeTrigger for use with acronyms after hyphen
- %^VOTrigger for use with acronyms after hyphen
- %^Sonorant for use with acronyms after hyphen Л|М|Н|Р|Ҥ
- %^Obstruent for use with acronyms after hyphen С|Ф|Ъ|Ь
- %^FrontObstr for use with acronyms after hyphen С|Ф|Ъ|Ь
The parts-of-speech are:
-
+N = nouns
-
+A = adjectives
-
+Adp = adpositions
-
+Adv = adverbs
-
+V = verbs
-
+Pron = pronouns
-
+CS = subjunctions
-
+CC = conjunctions
-
+Interj = interjections
-
+Pcle = particles
-
+Num = numerals
- +Descr = descriptive ideophones
POS subtags
-
+Po = postpositions
- +Pr = prepositons
-
+Prop = Proper noun
-
+Pers = Personal pronoun
-
+Dem = Demonstrative pronoun
-
+Interr = Interrogative pronoun
-
+Refl = Reflexive pronoun
-
+Recipr = Reciprocal pronoun
-
+Rel = Relative pronoun
-
+Indef = Indefinite pronoun
-
+Coll = Collective numerals -ын-
- +AssocColl = Collective associative numerals with obligatory possessive suffixes -нь-
-
+Patr = patronym, look at this in other cyr fsts.
-
+Aux = Auxiliary verb
- +Depend = ( pair verbs that do not occur independently get this marker.)
Have a look at these:
-
+Foc/Poss =
-
+Prf = perfective
-
+Arab = arabic numerals
-
+Qnt = quantifiers
-
+Rom = roman numerals
- +Weak = weak (?) form
The nominals are inflected in the following numbers
-
+Sg =
-
+Pl =
-
+AssocPl =
- +LocPl = location, better witho LocusPl to avoid Loc case?
The nominals are inflected in the following Case and Number
-
+Nom = nominative
-
+Gen = genitive
-
+Acc = accusative
-
+Com = comitative
-
+Ill = illative
-
+Ine = inessive
-
+Lat = lative
-
+Dat = dative
-
+Cmpr = comparative case
-
+Abe = abessive
-
+Voc = vocative
-
+Attr = attributive form
- +Instr =
The possession is marked as such:
-
+PxSg1 =
-
+PxSg2 =
-
+PxSg3 =
-
+PxPl1 =
-
+PxPl2 =
- +PxPl3 =
Suffix ordering tags:
-
+So/CP =
-
+So/PC =
-
+So/NCP =
-
+So/NPC =
-
+So/NP =
-
+So/PN =
- +So/PNC =
The comparative forms are:
-
+Comp = comparative (not: not Cmp)
- +Superl = superlative
Numerals are classified under:
-
+Card = (hmm, skip+Card?)
- +Ord =
Note the attributive tag, in defferent contexts
- +Attr =
Verb moods are:
-
+Ind = indicative
-
+Cond = conditional
-
+Imprt = imperative
- +Des = desiderative
Verb tenses are:
-
+Prs = present
-
+Prt1 = 1st preterite, direct observation
- +Prt2 = 2nd preterite, indirect narrative, conclusion
Verb personal forms are: (also used with personal pronouns)
-
+Sg1 =
-
+Sg2 =
-
+Sg3 =
-
+Pl1 =
-
+Pl2 =
- +Pl3 =
-
+Ext = form уло
- +Indep = forms огым, огыт, ите
Other verb forms are
-
+Inf = Infinitive
-
+Ger = Gerund
-
+Neg = Negation verb
-
+ConNeg = Invariant main verb in negation expression
-
+Prc = Participle
-
+Nec = Necessive infinitive
-
+Fut = Future participle
-
+Neg = Negative participle
-
+Imprf = Imperfective (?) -- XXX check this
-
+Act = Active
- +Pass = Passive
Question and Focus particles:
-
+Qst =
- +Foc =
-
+Foc/at = -at focus particla
-
+Foc/ak = -ak focus particle
-
+Foc/ys = -ys focus particle
-
+Foc/jan = -jan focus particle
- +Foc/ja = -ja focus particle
Derivations
-
+Ex/N = for derivation from N to anoter POS
-
+Ex/V = for derivation from V to anoter POS
-
+Ex/A = for derivation from A to anoter POS
-
+Ex/TV = change to other transitivity
- +EX/IV = change to other transitivity
-
+Der/Nom = Derivation V > N: Nominalization
-
+Der/NomNeg = Derivation V > N: Negative nominalization
-
+Der/Priv = Derivation N > A: Privative adjective
-
+Der/Poss = Derivation N > A: Possessive adjective, orig. genitive form without a head
-
+Der/Pur = Derivation N > A:
-
+Der/Rel = Derivation N > A: Relational adjective
-
+Der/Caus = Derivation V > V: Causative
-
+Der/Refl = Derivation V > V: Reflexive
- +Der/MWN = Modifier without noun (better: +A+Sg+Nom etc.)
Abbreviated words are classified with:
-
+ABBR = for abbreviations that (may) contain period
- +Symbol = independent symbols in the text stream, like £, €, ©
- +ACR = acronyms
Special symbols are classified with:
-
+CLB = clause and sentence boundary symbols
-
+PUNCT = other punctuation marks
-
+LEFT = paired symbols
- +RIGHT = paired symbols
The verbs are syntactically split according to transitivity:
-
+TV =
- +IV =
Special multiword units are analysed with:
- +Multi =
Non-dictionary words can be recognised with:
- +Guess =
Homony tags
These are especially for verbs. Note that this is not
-
+Hom1 = First pattern (let us say -ам)
-
+Hom2 = Second pattern (let us say -ем)
-
+Hom3 = Third pattern (if it should exist + even more?)
-
+Hom4 =
-
+Hom5 =
- +Hom6 =
Usage tags
The Usage extents are marked using following tags:
-
+Use/Marg marginal
-
+Use/-PLX Excluded in PLX-speller
-
+Use/SpellNoSugg recognized but not suggested in speller
-
+Use/Circ circular paths (old ^C^)
-
+Use/CircN circular paths for the numerals (old ^N^)
-
+Use/NG not-generate, for ped generation isme-ped.fst
-
+Use/MT Generate for MT only, for restricting analyses needed
-
+Use/NGminip Not for miniparadigm in VD dicts
-
+Use/Disamb means that the following is only used in the analyser feeding the disambiguator
-
+Use/GC only retained in the HFST Grammar Checker disambiguation analyser
-
+Use/-PMatch Do not include in fsts made for hfst-pmatch
- +MWESplit Split point for MWE
-
+Err/Orth = orthographical error (analysed, not accepted in speller)
-
+Use/-Spell = accepted in normative FST but not in speller
- +Use/Test = Dealing with lative form 2012-10-27 аваеш, пашаш
Semantic tags
-
+Sem/Act = Activity
-
+Sem/Amount = Amount
-
+Sem/Ani = Animate
-
+Sem/Aniprod = Animal Product
-
+Sem/Body = Bodypart
-
+Sem/Body-abstr = siellu, vuoig?a, jierbmi
-
+Sem/Build = Building
-
+Sem/Build-part = Part of Bulding, like the closet
-
+Sem/Cat = Category
-
+Sem/Clth = Clothes
-
+Sem/Clth-jewl = Jewelery
-
+Sem/Clth-part = part of clothes, boallu, sávdnji...
-
+Sem/Ctain = Container
-
+Sem/Ctain-abstr = Abstract container like bank account
-
+Sem/Ctain-clth
-
+Sem/Curr = Currency like dollár, Not Money
-
+Sem/Dance = Dance
-
+Sem/Dir = Direction like GPS-kursa
-
+Sem/Domain = Domain like politics, reindeerherding (a system of actions)
-
+Sem/Drink = Drink
-
+Sem/Dummytag = Dummytag
-
+Sem/Edu = Educational event
-
+Sem/Event = Event
-
+Sem/Feat = Feature, like Árvu
-
+Sem/Feat-phys = Physiological feature, ivdni, fárda
-
+Sem/Feat-psych = Psychological feauture
-
+Sem/Feat-measr = Psychological feauture
-
+Sem/Fem = Female name
-
+Sem/Food = Food
-
+Sem/Food-med = Medicine
-
+Sem/Furn = Furniture
-
+Sem/Game = Game
-
+Sem/Geom = Geometrical object
-
+Sem/Group = Animal or Human Group
-
+Sem/Hum = Human
-
+Sem/Hum-abstr = Human abstract
-
+Sem/Ideol = Ideology
-
+Sem/Lang = Language
-
+Sem/Mal = Male name
-
+Sem/Mat = Material for producing things
-
+Sem/Measr = Measure
-
+Sem/Money = Has to do with money, like wages, not Curr(ency)
-
+Sem/Obj = Object
-
+Sem/Obj-clo = Cloth
-
+Sem/Obj-cogn = Cloth
-
+Sem/Obj-el = (Electrical) machine or apparatus
-
+Sem/Obj-ling = Object with something written on it
-
+Sem/Obj-rope = flexible ropelike object
-
+Sem/Obj-surfc = Surface object
-
+Sem/Org = Organisation
-
+Sem/Part = Feature, oassi, bealli
-
+Sem/Perc-cogn = Cognative perception
-
+Sem/Perc-emo = Emotional perception
-
+Sem/Perc-phys = Physical perception
-
+Sem/Perc-psych = Physical perception
-
+Sem/Plant = Plant
-
+Sem/Plant-part = Plant part
-
+Sem/Plc = Place
-
+Sem/Plc-abstr = Abstract place
-
+Sem/Plc-elevate = Place
-
+Sem/Plc-line = Place
-
+Sem/Plc-water = Place
-
+Sem/Pos = Position (as in social position job)
-
+Sem/Process = Process
-
+Sem/Prod = Product
-
+Sem/Prod-audio = Audio product
-
+Sem/Prod-cogn = Cognition product
-
+Sem/Prod-ling = Linguistic product
-
+Sem/Prod-vis = Visual product
-
+Sem/Rel = Relation
-
+Sem/Route = Name of a Route
-
+Sem/Rule = Rule or convention
-
+Sem/Semcon = Semantic concept
-
+Sem/Sign = Sign (e.g. numbers, punctuation)
-
+Sem/Sport = Sport
-
+Sem/State =
-
+Sem/State-sick = Illness
-
+Sem/Substnc = Substance, like Air and Water
-
+Sem/Sur = Surname
-
+Sem/Symbol = Symbol
-
+Sem/Time = Time
-
+Sem/Tool = Prototypical tool for repairing things
-
+Sem/Tool-catch = Tool used for catching (e.g. fish)
-
+Sem/Tool-clean = Tool used for cleaning
-
+Sem/Tool-it = Tool used in IT
-
+Sem/Tool-measr = Tool used for measuring
-
+Sem/Tool-music = Music instrument
-
+Sem/Tool-write = Writing tool
-
+Sem/Txt = Text (girji, lávlla...)
-
+Sem/Veh = Vehicle
-
+Sem/Wpn = Weapon
- +Sem/Wthr = The Weather or the state of ground
Multiple Semantic tags:
-
+Sem/Act_Group =
-
+Sem/Act_Plc =
-
+Sem/Act_Route =
-
+Sem/Amount_Build =
-
+Sem/Amount_Semcon =
-
+Sem/Ani_Body-abstr_Hum =
-
+Sem/Ani_Build =
-
+Sem/Ani_Build-part =
-
+Sem/Ani_Build_Hum_Txt =
-
+Sem/Ani_Group =
-
+Sem/Ani_Group_Hum =
-
+Sem/Ani_Hum =
-
+Sem/Ani_Hum_Plc =
-
+Sem/Ani_Hum_Time =
-
+Sem/Ani_Plc =
-
+Sem/Ani_Plc_Txt =
-
+Sem/Ani_Time =
-
+Sem/Ani_Veh =
-
+Sem/Aniprod_Hum =
-
+Sem/Aniprod_Obj-clo =
-
+Sem/Aniprod_Perc-phys =
-
+Sem/Aniprod_Plc =
-
+Sem/Body-abstr_Prod-audio_Semcon =
-
+Sem/Body_Body-abstr =
-
+Sem/Body_Clth =
-
+Sem/Body_Food =
-
+Sem/Body_Group_Hum =
-
+Sem/Body_Hum =
-
+Sem/Body_Mat =
-
+Sem/Body_Measr =
-
+Sem/Body_Obj_Tool-catch =
-
+Sem/Body_Plc =
-
+Sem/Body_Time =
-
+Sem/Build-part_Plc =
-
+Sem/Build_Build-part =
-
+Sem/Build_Clth-part =
-
+Sem/Build_Edu_Org =
-
+Sem/Build_Event_Org =
-
+Sem/Build_Org =
-
+Sem/Build_Route =
-
+Sem/Clth-jewl_Curr =
-
+Sem/Clth-jewl_Money =
-
+Sem/Clth-jewl_Plant =
-
+Sem/Clth_Hum =
-
+Sem/Ctain-abstr_Org =
-
+Sem/Ctain-clth_Plant =
-
+Sem/Ctain-clth_Veh =
-
+Sem/Ctain_Feat-phys =
-
+Sem/Ctain_Furn =
-
+Sem/Ctain_Tool =
-
+Sem/Ctain_Tool-measr =
-
+Sem/Curr_Org =
-
+Sem/Dance_Org =
-
+Sem/Dance_Prod-audio =
-
+Sem/Domain_Food-med =
-
+Sem/Domain_Prod-audio =
-
+Sem/Edu_Event =
-
+Sem/Edu_Group_Hum =
-
+Sem/Edu_Mat =
-
+Sem/Edu_Org =
-
+Sem/Event_Food =
-
+Sem/Event_Hum =
-
+Sem/Event_Plc =
-
+Sem/Event_Time =
-
+Sem/Feat-phys_Tool-write =
-
+Sem/Feat-phys_Veh =
-
+Sem/Feat-phys_Wthr =
-
+Sem/Feat-psych_Hum =
-
+Sem/Feat_Plant =
-
+Sem/Food_Perc-phys =
-
+Sem/Food_Plant =
-
+Sem/Game_Obj-play =
-
+Sem/Geom_Obj =
-
+Sem/Group_Hum =
-
+Sem/Group_Hum_Org =
-
+Sem/Group_Hum_Plc =
-
+Sem/Group_Hum_Prod-vis =
-
+Sem/Group_Org =
-
+Sem/Group_Sign =
-
+Sem/Group_Txt =
-
+Sem/Hum_Lang =
-
+Sem/Hum_Lang_Plc =
-
+Sem/Hum_Lang_Time =
-
+Sem/Hum_Obj =
-
+Sem/Hum_Org =
-
+Sem/Hum_Plant =
-
+Sem/Hum_Plc =
-
+Sem/Hum_Tool =
-
+Sem/Hum_Veh =
-
+Sem/Hum_Wthr =
-
+Sem/Lang_Tool =
-
+Sem/Mat_Plant =
-
+Sem/Mat_Txt =
-
+Sem/Measr_Time =
-
+Sem/Money_Obj =
-
+Sem/Money_Txt =
-
+Sem/Obj-play =
-
+Sem/Obj-play_Sport =
-
+Sem/Obj_Semcon =
-
+Sem/Clth-jewl_Org =
-
+Sem/Org_Rule =
-
+Sem/Org_Txt =
-
+Sem/Org_Veh =
-
+Sem/Part_Prod-cogn =
-
+Sem/Perc-emo_Wthr =
-
+Sem/Plant_Plant-part =
-
+Sem/Plant_Tool =
-
+Sem/Plant_Tool-measr =
-
+Sem/Plc-abstr_Rel_State =
-
+Sem/Plc-abstr_Route =
-
+Sem/Plc_Pos =
-
+Sem/Plc_Route =
-
+Sem/Plc_Substnc =
-
+Sem/Plc_Substnc_Wthr =
-
+Sem/Plc_Time =
-
+Sem/Plc_Tool-catch =
-
+Sem/Plc_Wthr =
-
+Sem/Prod-audio_Txt =
-
+Sem/Prod-cogn_Txt =
-
+Sem/Semcon_Txt =
-
+Sem/Obj_State =
-
+Sem/Substnc_Wthr =
- +Sem/Time_Wthr =
Semantics are classified with
Derivations are classified under the morphophonetic form of the suffix, the
-
+V→N =
-
+V→V =
-
+V→A =
-
+N→A =
-
+Der/xxx =
- +Der/mO =
Morphophonology
- %{аы%} Stem-final vowel variation when stress falls on non-final vowel word-final е and presuffix ы
- %{еы%} Stem-final vowel variation when stress falls on non-final vowel word-final е and presuffix ы
- %{оы%} Stem-final vowel variation when stress falls on non-final vowel
- %{ӧы%} Stem-final vowel variation when stress falls on non-final vowel
- %{яы%} Stem-final vowel variation when stress falls on non-final vowel preceded by ь
- %{еоыӧØ%} PxSg3 final
- %{ыØ%} PxSg3 onset
- %{ьØ%} for -ам verbs Prt1 Sg1, Sg2, Sg3, Pl3 л н
- {aä} for vowel harmony
- {oö} for vowel harmony
- {uü} for vowel harmony
-
е1 =
-
а1 =
-
и1 =
-
у1 =
-
ӱ1 =
- я1 =
-
Е1 = lative
-
Е2 =
-
А2 =
-
Ы1 = stem-onset archi-vowel
-
Ы2 =
-
з2 = for возаш : воч
- к2 кочк- коч# "eat/есть" мушк- муш "wash/мыть"
- н2 шинч- шич# "sit down/сесть"
- т2 лект- лек# "leave/ уходить"
-
%> =
- +TEST =
And following triggers to control variation
- %^V2IMPRT for -ем verbs in й
- %^END for -ам verb final, i.e. Imprf
- {front}
- {back}
-
X1 =
-
X2 =
-
X3 =
-
X4 =
-
X5 =
-
X6 =
-
X7 =
-
X8 =
-
X9 =
-
Z1 =
-
Z2 =
- %-
- %^VoTrigger for use with acronyms after hyphen о у ё ю О У Ё Ю
- %^VeTrigger for use with acronyms after hyphen а е и э я А Е И Э Я
- %^VOTrigger for use with acronyms after hyphen ӧ ӱ Ӧ Ӱ
- %^Sonorant for use with acronyms after hyphen Л|М|Н|Р|Ҥ
- %^Obstruent for use with acronyms after hyphen С|Ф|Ъ|Ь
Symbols that need to be escaped on the lower side (towards twolc):
- »7
- Literal »
- «7
- Literal «
%[%>%] - Literal > %[%<%] - Literal <
Flag diacritics
@P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised |
@D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised |
@C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised |
For languages that allow compounding, the following flag diacritics are needed
@P.CmpFrst.FALSE@ | Require that words tagged as such only appear first |
@D.CmpPref.TRUE@ | Block such words from entering ENDLEX |
@P.CmpPref.FALSE@ | Block these words from making further compounds |
@D.CmpLast.TRUE@ | Block such words from entering R |
@D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding |
@U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding |
@P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R |
@D.CmpOnly.FALSE@ | Disallow words coming directly from root. |
Use the following flag diacritics to control downcasing of derived proper
@U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj. |
@U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj. |
The Root lexicon
@U.Cap.Opt@ Here it all starts
The word forms in Meadow Mari language start from the lexeme roots of
the following basic word classes:
- adjectives ;
- Exceptions ;
urj-Cyrl-ProperNouns ;
ProperNoun-mhr ; specifically Mari names
Continuation lexica
Here comes a set of ragbag continuation lexica.
- LEXICON ADP_ TODO: why +WORK?
- LEXICON CONJ_ TODO: why +WORK? All CONJ_ should be identified as either CC or CS or both, work in progress
- LEXICON CC_ conjunctinos
- LEXICON CS_ subjunctions
- LEXICON DESCR-AUD_ these are audible, others may be visible or otherwise sensed, but for now just calling them Interj+Descr should suffice
- LEXICON AD-A also adverbs
- LEXICON INTERJ_ interjections
- LEXICON Puh-a/e XXX do not know
- LEXICON Puh XXX do not know
- LEXICON PCLE_ particles, check these
- LEXICON X for N attributes