root-morphology
Tags and root lexicon for Komi
Morphology
Analysis symbols
The parts-of-speech tags are:
- +A
- adjective кывберд прилагательное
- +Adp
- adposition (prepositio, postposition)
- +Adv
- adverb урчитан наречие
- +CS
- subordinating conjunction XX подчинительный союз
- +CC
- coordinating conjunction XX сочинительный союз
- +CONJ
- conjunction word XX союзное слово (здесь надо узнать который из 2 выш.)
- +Det
- determiner XX XX
- +Interj
- interjection междометтьӧ междометие
- +N
- noun эмакыв - существительное
- +Num
- numeral лыдакыв числительное
- +Pcle
- particle кывтор частица
- +Po
- postposition кывбӧр послелог
- +Pr
- preposition XX предлог
- +Pron
- pronoun нимвежтас местоимение
- +Qnt
- Quantifier ХХ XX
- +V
- verb кадакыв глагол
The parts of speech are further split up into:
Adverbs
-
+Adv-Ideoph These are ideophonic descriptors used to modify the verb
-
+AdA Degree
-
+Manner with reference to type of adverb
-
+Spat spatial
-
+Temp temporal
- +Parenthetic parenthetical phrase
Interjections
Nouns
-
+Prop proper
-
+CollN used with paired nouns collective nouns
- +Relat relational noun: выв, ув
Postpositions
Pronouns
- +Dem
- demonstrative
- +Indef
- indefinite
- +Interr
- interrogative
- +Pers
- personal
- +Recipr
- reciprocal
- +Refl
- reflexive
- +Rel
- relative
- +Poss
- possessive
Quantifiers (numerals)
- +Num
- numeral лыдакыв
- +Appr
- Approximative numeral кавто-колмо, колмошка two or three
- +AssocColl
- -ne- ; avide-
- +Assoc
- +мезть
- +Card
- cardinal + NCard
- +Coll
- collective
- +Distr
- Distributive
- +Iter
- Iterative form expressing number of consecutive times; kpv
: кыкысь - +Mult
- Multiplicative adverbs number of times; kpv
: кык пӧв - +Ord
- ordinal + NOrd
- +Coord
- Coordinates, i.e. 65˚36′8,30″ in numerals.lexc
Nominals are inflected for Number and Case
Number
-
+Sg singular
- +Pl plural
Case
A category of case in Komi can be identified as:
-
+Acc accusative ZERO керан
-
+Acc1 accusative -ӧс керан
-
+Acc3 accusative -сӧ керан
-
+Abl ablative case -лысь босьтан
-
+Apr approximative -лань матыстчан
-
+AprEgr approximative egressive -ланьсянь матысь ылыстчан
-
+AprEla approximative elative -ланьысь матысь петан
-
+AprIll approximative illative -ланьӧ матӧ матыстчан
-
+AprIne approximative inessive -ланьын матыс ина
-
+AprPrl approximative prolative -ланьӧд маті вуджан
-
+AprTer approximative terminative -ланьӧдз матіӧдз воан
-
+AprTra approximative translative -ланьті маті вуджан
-
+AprEgr approximative egressive -ланьсянь матысь ылыстчан
-
+Car cartive -тӧг торйӧдан
-
+Cns consecultative -ла могман
-
+Com Comitative -кӧд ӧтвывтан
-
+Cmpr Comparative case form -ся ӧткодялан
-
+Cmpl Postposition complement
-
+Dat dative case -лы сетан
-
+Egr egressive -сянь ылыстчан
-
+Ela elative -ысь петан
-
+Gen genitive case -лӧн асалан
-
+Ill illative -ӧ пыран
-
+Ine inessive -ын ина
-
+Ins instrumental -ӧн керанторъя
-
+Nom nominative case нимтан
-
+Prl prolative -ӧд вуджан
-
+Tra translative -ті вуджан
-
+Ter Terminative -ӧдз матыстчан
-
+Voc Vocative ??
- +Abs Absolute = +Sg+Nom
The possession is marked as such:
-
+PxSg1 +PxSg2 +PxSg3 +PxPl1 +PxPl2 +PxPl3
- +Px1 +Px2 +Px3
The comparative forms are:
-
+Comp +Superl
-
+Attr +Card
-
+Ord
-
+Coll Collective
-
+Distr Distributive
- +Iter Iterative form expressing number of times
Verb moods are:
Other verb forms are
-
+VAbess тӧм Participle
-
+VCar тӧг Gerund
- +VTer тӧдз Gerund
- +Symbol = independent symbols in the text stream, like £, €, ©
-
+TV
- +IV
Special multiword units are analysed with:
-
+Multi
- +Guess
Question and Focus particles:
-
+Qst
-
+Foc
-
+Clt/И This comes at the end of a word -и or after vowels (some authors use -й)
-
+Clt
-
+Clt/а
-
+Clt/ӧ
-
+Clt/тӧ
- +Clt/сӧ
Tags distinguishing different versions of the same lemma (before POS)
- +v1
- +v2
- +v3
- +v4
- +v5
- +v6
- +v7
- +v8
- +v9
- +v10
- +v11
- +v12
- +v13
- +v14
- +v15
- +v16
- +v17
- +v18
- +v19
- +v20
- +v21
- +v22
- +v23
- +v24
The Usage extents are marked using following tags:
-
+Err/Orth
-
+Err/Orth-no-paragogic-j
-
+Err/Orth-no-paragogic-k
-
+Err/Orth-no-paragogic-m
-
+Err/Orth-no-paragogic-t
-
+Err/Dial e.g. тэг instead of тӧг
-
+Err/Lex substandard, not in normative fst, no normative lemma помсьыны
-
+Use/-Spell
- +Use/SpellNoSugg
-
+Use/PMatch means that the following is only used in the analyser feeding the disambiguator
- +Use/-PMatch Do not include in fst's made for hfst-pmatch
Dialect features
Where do these come from source
-
+Src/F foreign source apparently 2015-09-08
- +Dim diminutive
- +NonHum look at this and place somewhere
-
+Sem/Act Activity
-
+Sem/Amount Amount
-
+Sem/Ani Animate
-
+Sem/Aniprod Animal Product
-
+Sem/Body Bodypart
-
+Sem/Body-abstr siellu, vuoig?a, jierbmi
-
+Sem/Build Building
-
+Sem/Build-part Part of Bulding, like the closet
-
+Sem/Cat Category
-
+Sem/Clth Clothes
-
+Sem/Clth-jewl Jewelery
-
+Sem/Clth-part part of clothes, boallu, sávdnji...
-
+Sem/Ctain Container
-
+Sem/Ctain-abstr Abstract container like bank account
-
+Sem/Ctain-clth
-
+Sem/Curr Currency like dollár, Not Money
-
+Sem/Dance Dance
-
+Sem/Dir Direction like GPS-kursa
-
+Sem/Domain Domain like politics, reindeerherding (a system of actions)
-
+Sem/Drink Drink
-
+Sem/Dummytag Dummytag
-
+Sem/Edu Educational event
-
+Sem/Event Event
-
+Sem/Feat Feature, like Árvu
-
+Sem/Feat-phys Physiological feature, ivdni, fárda
-
+Sem/Feat-psych Psychological feauture
-
+Sem/Feat-measr Psychological feauture
-
+Sem/Fem Female name
-
+Sem/Food Food
-
+Sem/Food-med Medicine
-
+Sem/Furn Furniture
-
+Sem/Game Game
-
+Sem/Geom Geometrical object
-
+Sem/Group Animal or Human Group
-
+Sem/Hum Human
-
+Sem/Hum-abstr Human abstract
-
+Sem/Ideol Ideology
-
+Sem/Lang Language
-
+Sem/Mal Male name
-
+Sem/Mat Material for producing things
-
+Sem/Measr Measure
-
+Sem/Money Has to do with money, like wages, not Curr(ency)
-
+Sem/Obj Object
-
+Sem/Obj-clo Cloth
-
+Sem/Obj-cogn Cloth
-
+Sem/Obj-el (Electrical) machine or apparatus
-
+Sem/Obj-ling Object with something written on it
-
+Sem/Obj-rope flexible ropelike object
-
+Sem/Obj-surfc Surface object
-
+Sem/Org Organisation
-
+Sem/Part Feature, oassi, bealli
-
+Sem/Perc-cogn Cognative perception
-
+Sem/Perc-emo Emotional perception
-
+Sem/Perc-phys Physical perception
-
+Sem/Perc-psych Physical perception
-
+Sem/Plant Plant
-
+Sem/Plant-part Plant part
-
+Sem/Plc Place
-
+Sem/Plc-abstr Abstract place
-
+Sem/Plc-elevate Place
-
+Sem/Plc-line Place
-
+Sem/Plc-water Place
-
+Sem/Pos Position (as in social position job)
-
+Sem/Process Process
-
+Sem/Prod Product
-
+Sem/Prod-audio Audio product
-
+Sem/Prod-cogn Cognition product
-
+Sem/Prod-ling Linguistic product
-
+Sem/Prod-vis Visual product
-
+Sem/Rel Relation
-
+Sem/Route Name of a Route
-
+Sem/Rule Rule or convention
-
+Sem/Semcon Semantic concept
-
+Sem/Sign Sign (e.g. numbers, punctuation)
-
+Sem/Sport Sport
-
+Sem/State
-
+Sem/State-sick Illness
-
+Sem/Substnc Substance, like Air and Water
-
+Sem/Sur Surname
-
+Sem/Symbol Symbol
-
+Sem/Time Time
-
+Sem/Tool Prototypical tool for repairing things
-
+Sem/Tool-catch Tool used for catching (e.g. fish)
-
+Sem/Tool-clean Tool used for cleaning
-
+Sem/Tool-it Tool used in IT
-
+Sem/Tool-measr Tool used for measuring
-
+Sem/Tool-music Music instrument
-
+Sem/Tool-write Writing tool
-
+Sem/Txt Text (girji, lávlla...)
-
+Sem/Veh Vehicle
-
+Sem/Wpn Weapon
-
+Sem/Wthr The Weather or the state of ground
- +Sem/Year
-
+Sem/Sur_Fem Surname female
-
+Sem/Sur_Mal Surname male
-
+Sem/Ant Anthroponym
-
+Sem/Ant_Fem Anthroponym female
-
+Sem/Ant_Mal Anthroponym male
-
+Sem/Patr Patronym
-
+Sem/Patr_Fem Patronym female
- +Sem/Patr_Mal Patronym male
-
+Sem/Event_Plc сёянін
- +Sem/Hum_Prof profession, capacity doctor, tractor driver
Semantics are classified with
Derivations are classified under the morphophonetic form of the suffix, the
-
+Der In front of every derivation to make it
-
+Der/Ан Process Participle +AN
-
+Der/Ана Process Participle +ANA
-
+Der/Анаа adverb derived from participle (+ANA) +ANAA
- +Der/чӧж +CHOZH
-
+Instr
-
+NomAct
-
+Der/NomAct +Event
-
+Der/NomAg
- +Duration
- +Der/иг
-
+Der/ысь
-
+ActPrsPtc
-
+Der/Ан Participle
-
+Der/Ана Gerund or participle according to context (with...)
-
+PrsPrc
-
+PrsPtc
-
+Der/ӧм
- +PastPtc
-
+Der/кості +KOSTI
-
+Der/коста +KOSTA
- +Der/кежлӧ +KEZHLO
-
+Der/мысь +MYS
- +Der/мысьт +MYST
-
+MAbe abessive modifier -тӧм
-
+MLoc locative modifier са -
-
+MHab habeo modifier а -
- +MTmp temporal modifier ся -
-
+LocMod IneMod Быд во шедӧдӧны бур успеваемость Воркута да Инта каръясса, Прилузскӧй да Княжпогостскӧй районъясса школаяс.
-
+CompMod
-
+Der/тӧм used with nouns and followed by +AbeMod
-
+PrivMod AbeMod джуджыд анализъястӧм да обобщениеястӧм статьяяс.
-
+ProprietiveMod HabObjMod Весиг киясыс тӧдсаӧсь, найӧ мугов рӧмаӧсь, кузь чорыд чуньясаӧсь.
- +Der/TempMod TempMod Der/ся но и Ф. В. Плесовскийлысь квайтымынӧд вояссяяссӧ * позьӧ аддзыны сӧмын библиотекаясысь.
2012-09-11 Perhaps this is only syntactic
- +Der/N Noun derived with conversion from noun, conversion but not ZERO
- +Der/A Adjective derivated from Noun or Verb
- +Der/Adv Adverb derivated from Adjective
Tags for Ethymological Origin marking. This has initially used used with proper nouns
Sentence markers
- +Cop Copula кадакыв, коді шуӧ: вӧлі либӧ ӧнія кадся Связка
Morphophonology
To represent phonologic variations in word forms we use the following
- {aä}
- Vowel alternating symbol
- {oö}
- Vowel alternating symbol
- {uü}
- Vowel alternating symbol
-
к2 л2 м2 т2 ь2 К2 Л2 М2 Т2 Ь2 И2
- %> suffix border
- %{иі%}
- for soft and hard
- %{ая%}
- for soft and hard
And following triggers to control variation
- {front}
- Vowel change triggers
- {back}
- Vowel change triggers
- %^Close Close syllable, this triggers final consonant drop, seen in
Valency tags, i.e. tags assigned to verbs for denoting their arbuments
- +%<acc%> accusative
- +%<ela%> elative -ысь
- +%<ins%> instrumental -ӧн
- +%<inf_ны%> infinitive in -ны
- +%<po_вылӧ%> postposition вылӧ
- +%<sub_мый%> subordinate clause in мый/that
Symbols that need to be escaped on the lower side (towards twolc):
- »
- «
- > (written with square brackets, see the root.lexc file)
- < (written with square brackets, see the root.lexc file)
Flag diacritics
We have manually optimised the structure of our lexicon using following
@P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised |
@D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised |
@C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised |
Two flags copied from sme
@P.Pmatch.Loc@ | Used on multi-token analyses; tell hfst-tokenise/pmatch where in the form/analysis the token should be split. |
@P.Pmatch.Backtrack@ | Used on single-token analyses; tell hfst-tokenise/pmatch to backtrack by reanalysing the substrings before and after this point in the form (to find combinations of shorter analyses that would otherwise be missed) |
For languages that allow compounding, the following flag diacritics are needed
handled automatically if combined with +CmpN/xxx tags. If not used, they will
@P.CmpFrst.FALSE@ | Require that words tagged as such only appear first |
@D.CmpPref.TRUE@ | Block such words from entering ENDLEX |
@P.CmpPref.FALSE@ | Block these words from making further compounds |
@D.CmpLast.TRUE@ | Block such words from entering R |
@D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding |
@U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding |
@P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R |
@D.CmpOnly.FALSE@ | Disallow words coming directly from root. |
Use the following flag diacritics to control downcasing of derived proper
@U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj. |
@U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj. |
- +Cmp
+Cmp/Serial used with serial verbs |
FLAGS USED WITH COLLECTIVE NOUNS
Removal
@U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj |
@U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj |
Lexicon Root
The word forms in Komi (Zyrian) language start from the lexeme roots of basic
Testing 2015-09-06
пу керка
Lexicon ENDLEX
@D.CmpOnly.FALSE@@D.CmpPref.TRUE@@D.NeedNoun.ON@ # ;
The @D.CmpOnly.FALSE@ flag diacritic is ued to disallow words tagged