root-morphology

Buryaad morphological analyser !

INTRODUCTION TO MORPHOLOGICAL ANALYSER OF BURYAAD.

Definitions for Multichar_Symbols

  • +N Noun
  • +V Verb
  • +A Adjective
  • +Adv Adverb
  • +Pcle Particles (Probably adverbs, look into this)
  • +CC Conjunction
  • +CS Subjunction
  • +Interj Interjection
  • +Pron Pronoun
  • +Prop Propernoun
  • +Num Numaral
  • +Det Determiner (Demonstrative?)
  • +Po Postposition
  • +Symbol = independent symbols in the text stream, like £, €, ©
  • +Prs Present
  • +Fut Future
  • +Prt Preterite
  • +Prf Perfect
  • +Ind Indicative
  • +Imp Imperative
  • +Cond Conditional
  • +Opt Optative
  • +Vol Voluntative
  • +Dur Durative
  • +Term Terminative
  • +Conf Conf
  • +Sg1 first person singular
  • +Sg2 second person singular
  • +Sg3 third person singular
  • +Pl1 first person plural
  • +Pl2 second person plural
  • +Pl3 third person plural
  • +Inf Infinitive
  • +Pos Positive
  • +Neg Negative
  • +TV Transitive
  • +IV Intransitive
  • +Sg Singular
  • +Pl Plural
  • +Nom Nominative
  • +Acc Accusative
  • +Gen Genitive
  • +Abl Ablative
  • +Dat Dative
  • +Ins Instrumental
  • +Com Comitative
  • +Ord Ordinal
  • +Presc Prescriptive mood
  • +AgPrc
  • +AgConstPrc
  • +DualPrc
  • +FutPrc
  • +HabPrc
  • +ImpfPrc
  • +PassPrc
  • +PrfPrc
  • +PotPrc
  • +PrsPrc
  • +ResPrc
  • +ConMod
  • +ConImpf
  • +ConPrf
  • +ConCond
  • +ConConc
  • +ConTerm
  • +ConCntmp
  • +ConAbtmp
  • +ConFin
  • +ConIntnt
  • +ConSucc
  • +ConCmp
  • +PxSg1 first person singular possessive
  • +PxSg2 second person singular possessive
  • +PxSg3 third person singular possessive
  • +PxPl1 first person plural possessive
  • +PxPl2 second person plural possessive
  • +PxPl3 third person plural possessive
  • +Px3 third person plural possessive
  • %{A%} letter class
  • %{D%} letter class
  • %{G%} letter class
  • %{I%} letter class
  • %{J%} letter class
  • %{U%} letter class
  • %{V%} letter class
  • %{Ө%} letter class
  • %{Y%} kept after Cns, deleted after Vow
  • а2 я2 м2 these are а and я in Russian loanwords that do not weaken to ых
  • %^END we do the mhr trick to harmonise twolc and lexc

Usage tags

  • +Use/NG Do not generate

Symbols that need to be escaped on the lower side (towards twolc):

»7
Literal »
«7
Literal «
  %[%>%]  - Literal >
  %[%<%]  - Literal <

Flag diacritics

We have manually optimised the structure of our lexicon using following flag diacritics to restrict morhpological combinatorics - only allow compounds with verbs if the verb is further derived into a noun again:

@P.NeedNoun.ON@ (Dis)allow compounds with verbs unless nominalised
@D.NeedNoun.ON@ (Dis)allow compounds with verbs unless nominalised
@C.NeedNoun@ (Dis)allow compounds with verbs unless nominalised

For languages that allow compounding, the following flag diacritics are needed to control position-based compounding restrictions for nominals. Their use is handled automatically if combined with +CmpN/xxx tags. If not used, they will do no harm.

@P.CmpFrst.FALSE@ Require that words tagged as such only appear first
@D.CmpPref.TRUE@ Block such words from entering ENDLEX
@P.CmpPref.FALSE@ Block these words from making further compounds
@D.CmpLast.TRUE@ Block such words from entering R
@D.CmpNone.TRUE@ Combines with the next tag to prohibit compounding
@U.CmpNone.FALSE@ Combines with the prev tag to prohibit compounding
@P.CmpOnly.TRUE@ Sets a flag to indicate that the word has passed R
@D.CmpOnly.FALSE@ Disallow words coming directly from root.

Use the following flag diacritics to control downcasing of derived proper nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use these flags. There exists a ready-made regex that will do the actual down-casing given the proper use of these flags.

@U.Cap.Obl@ Allowing downcasing of derived names: deatnulasj.
@U.Cap.Opt@ Allowing downcasing of derived names: deatnulasj.

Key lexicon

LEXICON Root is where it all starts, with these lexica:

  • Noun ;
  • urj-Cyrl-ProperNouns ;
  • bxr-Propernouns ;
  • Verb ;
  • Adjective ;
  • Adverb ;
  • Subjunction ;
  • Interjection ;
  • Pronoun ;
  • Propernoun ;
  • Postposition ;
  • Particles ; , these should rather be adverbs
  • Punctuation ;
  • Symbols ;
  • Conjunction ;
  • Numeral ;
  • Abbreviation ;