root-morphology
Kven morphological transducer
Beware of remnants from the Finnish file.
Tags for POS
-
+A = Adjective
-
+Adv = Adverb
-
+CC = Conjunction
-
+CS = Subjunction
-
+Interj = Interjection
-
+N = Noun
-
+Num = Numerals
-
+Pcle = Participle?
-
+Po = Postposition
-
+Pr = Preposition
-
+Pron = Pronomen
-
+V = Verb
-
+Neg = Negation verb ei
- +ConNeg = Negation form of verb
-
+Prop = Propernoun
-
+Ord = Ordinal
-
+ABBR = Abbreviation
- +Symbol = independent symbols in the text stream, like £, €, ©
-
+ACR = Acronym
-
+Arab = Arabic
- +Coll = Collective numeral
Tags for grammar
Pronoun types
-
+Pers = Personal
-
+Dem = Demonstrative
-
+Interr = Interrogative
-
+Refl = Reflexive
-
+Recipr = Reciprocal
-
+Rel = Relative
-
+Indef = Indefinitue
- +Qu = Quantity
Number
-
+Sg = Singular
- +Pl = Plural
Number-person
-
+Sg1 = Singular 1
-
+Sg2 = Singular 2
-
+Sg3 = Singular 3
-
+Pl1 = Plural 1
-
+Pl2 = Plural 2
-
+Pl3 = Plural 3
-
+PxSg1 = Poss suff: the owner is Singular 1
-
+PxSg2 = Poss suff: the owner is Singular 2
-
+PxSg3 = Poss suff: the owner is Singular 3
-
+PxPl1 = Poss suff: the owner is Plural 1
-
+PxPl2 = Poss suff: the owner is Plural 2
- +PxPl3 = Poss suff: the owner is Plural 3
Case
-
+Nom = Nominative
-
+Gen = Genitive
-
+Acc = Accusative, for pronouns, but is it correct?
-
+Ine = Inessive
-
+Ill = Illative
-
+Ela = Elative
-
+Ade = Adessive
-
+Abe = Abessive
-
+All = Allative
-
+Abl = Ablative
-
+Ess = Essive
-
+Tra = Translaive
-
+Ins = Instructive
-
+Com = Comitative
- +Par = Partitive
Comparatives
-
+Compar = Comparative
- +Superl = Superlative
Finite verbs
-
+Pass = Passive
-
+Ind = Indicative
-
+Act = Active
-
+Prs = Presens
-
+Prt = Preteritum
-
+Imprt = Imperative
-
+Cond = Conditional
- +Pot = Potential
Infinite verbs
-
+Inf = Infinitive
-
+Lat = lative (the infinitive, used in Apertium)
-
+PrsPrc = Presence Particip
-
+PrfPrc = Preteritum Particip
- +Inf3 = Infinite 3
Punctuation
-
+CLB = Clause boundary
-
+PUNCT = Punctuation mark
-
+HYPH = Hyphenation mark
-
+Attr = Attributive form, hmm, check, for names?
-
+LEFT = left parenth
- +RIGHT = right parenth
Speller tags
-
+Use/-Spell = Excluded in speller
-
+Use/SpellNoSugg = recognized but not suggested in speller
Compounds
-
+Cmp =
- +Cmp/Hyph - on dynamic compounds that have a hyphen (in use?)
@P.Pmatch.Backtrack@ | Used on single-token analyses; tell hfst-tokenise/pmatch to backtrack by reanalysing the substrings before and after this poin in the form (to find combinations of shorter analyses that would otherwise be missed) |
Derivation
- +Der =
-
+Der/minen =
-
+Der/s = deriving numerals
Clitic tags
-
+Clt =
-
+Qst = Focusclitic question -ko
-
+Foc/han = Focusclitic -han
-
+Foc/kaan = Focusclitic -kaan
-
+Foc/kin = Focusclitic -kin
-
+Foc/pa = Focusclitic -pa
-
+Foc/s = Focusclitic -s
- +Foc/pas = Focusclitic -pas
Tokeniser tags
- +MWE = multiword expression, for tokenisation
-
+v1 =
- +v2 =
Semantic tags
-
+Sem/Ani = Animal names
-
+Sem/Fem = Female names
-
+Sem/Mal = Male names
-
+Sem/Obj = Names of objects
-
+Sem/Org = Names of organisations
-
+Sem/Plc = Place names
-
+Sem/Sur = Surnames
- +Sem/ID = ID
Dialect tags
-
+Dial/-Var = Not Varanger
-
+Dial/-Por = Not Porsanger
-
+Dial/-Jok = Not Jokivarret
-
+Dial/Var Varanger, short for +Dial/-Jok+Dial/-Por
-
+Dial/Por Porsanger, short for +Dial/-Jok+Dial/-Var
- +Dial/Jok Jokivarret, short for +Dial/-Por+Dial/-V
Stem variant tags
- +v1
- variant 1
- +v2
- variant 2
- +v3
- variant 3
- +v4
- variant 4
- +v5
- variant 5
- +v6
- variant 6
- +v7
- variant 7
Phonological symbols
-
i2 = plural i of nouns
-
i3 = past tense i of verbs
-
i4 = i in conditional isi of most verbs (without gemination)
-
i5 = superlative i of adjectives
-
i6 = i: j in poika: pojan
- i7 = i in conditional of contract verbs (with gemination)
-
p2 = always p
-
t2 = always t, cf. katt2oma always tt, underlying -ts-
-
t3 = t participating in gradation, but not in t: s
-
k2 = always k
-
k3 = k3 never k: v, contrary to k
-
^A = Vowel harmony a/ä
-
^O = Vowel harmony o/ö
-
^U = Vowel harmony u/y
-
^V = Vowel copying
-
^N = tulˆNut, kävel^N^Ut
-
^E2I = for e to i change
-
^A2I = for a to i change
-
^I0 = i to 0 in vanha_a_21 -Por with i endings: tooline
-
^E0 = e to 0 in vanha_a_32 and vanha_n_32 bc we add b4 dial trigger, for twolc struc.
-
^HMETA = for h metathesis syksy - sykshyyn
-
^AO = a: 0
-
^A0 = a: o rannoissa
-
^WG = Weak grade matto - maton
-
^TJ = vuote vuoje
-
^T0 = tytär tyär tytärtä tyärtä in Var
-
^UU = vuote vuue
-
^TES = in use?
-
^VDEL = Deleting long vowel in rakkaa- > rakas
-
^EDEL = Deleting e in front of consonant
-
^AE = for a to e change
-
^M2N = for m to n in lumi lunta
-
^¤ = potecting against e: i word-finally (nalle, liike)
- ^Por -- Porsanger dialect
- ^Var -- Varanger dialect
- ^Jok -- Jokivarret dialect
- ^End -- End of word, since the # tags don't work properly
- »
- «
- > (written with square brackets, see the root.lexc file)
- < (written with square brackets, see the root.lexc file)
Flag diacritics
@P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised |
@D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised |
@C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised |
For languages that allow compounding, the following flag diacritics are needed
@P.CmpFrst.FALSE@ | Require that words tagged as such only appear first |
@D.CmpPref.TRUE@ | Block such words from entering ENDLEX |
@P.CmpPref.FALSE@ | Block these words from making further compounds |
@D.CmpLast.TRUE@ | Block such words from entering R |
@D.CmpSuff.TRUE@ | Block such words from entering R |
@P.CmpSuff.TRUE@ | Mark that we have passed R |
@D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding |
@U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding |
@P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R |
@D.CmpOnly.FALSE@ | Disallow words coming directly from root. |
Use the following flag diacritics to control downcasing of derived proper
@U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj. |
@U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj. |
@C.ErrOrth@ | |
@D.ErrOrth.ON@ | |
@P.ErrOrth.ON@ |
@U.pron.nom@ | tbw |
@U.pron.gen@ | tbw |
@U.pron.gen2@ | tbw |
@U.pron.ill@ | tbw |
@U.pron.par@ | tbw |
@U.pron.par2@ | tbw |
@U.pron.par3@ | tbw |
@U.pron.ess@ | tbw |
@U.pron.tra@ | tbw |
@U.pron.ine@ | tbw |
@U.pron.ela@ | tbw |
@U.pron.all@ | tbw |
@U.pron.ade@ | tbw |
@U.pron.abl@ | tbw |
@P.compound.block@ | tbw |
@D.compound.block@ | tbw |
Basic lexica, pointing to the other lexicon files
Here is the Root lexicon, pointing to all the parts of speech:
LEXICON Root
- AdjectiveRoot ;
- Adverb ;
- Conjunction ;
- Interjection ;
- NUM ;
- NounRoot ;
- Particle ;
- Postposition ;
- Preposition ;
- Pronoun ;
- ProperNoun ;
- Punctuation ;
- Symbols ;
- VerbRoot ;
- Subjunction ;
- Abbreviation ;
- Acronym ;