root-morphology

Kven morphological transducer

Beware of remnants from the Finnish file. Take nothing at face value!

+Prop = Propernoun
+Ord = Ordinal
+ABBR = Abbreviation
+Symbol = independent symbols in the text stream, like £, €, ©
+ACR = Acronym
+Arab = Arabic
+Coll = Collective numeral

Tags for grammar

Pronoun types

+Pers = Personal
+Dem = Demonstrative
+Interr = Interrogative
+Refl = Reflexive
+Recipr = Reciprocal
+Rel = Relative
+Indef = Indefinitue
+Qu = Quantity

Number

+Sg = Singular
+Pl = Plural

Number-person

+Sg1 = Singular 1
+Sg2 = Singular 2
+Sg3 = Singular 3
+Pl1 = Plural 1
+Pl2 = Plural 2
+Pl3 = Plural 3
+PxSg1 = Poss suff: the owner is Singular 1
+PxSg2 = Poss suff: the owner is Singular 2
+PxSg3 = Poss suff: the owner is Singular 3
+PxPl1 = Poss suff: the owner is Plural 1
+PxPl2 = Poss suff: the owner is Plural 2
+PxPl3 = Poss suff: the owner is Plural 3

Case

+Nom = Nominative
+Gen = Genitive
+Acc = Accusative, for pronouns, but is it correct?
+Ine = Inessive
+Ill = Illative
+Ela = Elative
+Ade = Adessive
+Abe = Abessive
+All = Allative
+Abl = Ablative
+Ess = Essive
+Tra = Translaive
+Ins = Instructive
+Com = Comitative
+Par = Partitive

Comparatives

+Compar = Comparative
+Superl = Superlative

Finite verbs

+Pass = Passive
+Ind = Indicative
+Act = Active
+Prs = Presens
+Prt = Preteritum
+Imprt = Imperative
+Cond = Conditional
+Pot = Potential

Infinite verbs

+Inf = Infinitive
+Lat = lative (the infinitive, used in Apertium)
+PrsPrc = Presence Particip
+PrfPrc = Preteritum Particip
+Inf3 = Infinite 3

Punctuation

+CLB = Clause boundary
+PUNCT = Punctuation mark
+HYPH = Hyphenation mark
+Attr = Attributive form, hmm, check, for names?
+LEFT = left parenth
+RIGHT = right parenth

Speller tags

+Err/Orth misspelling or nor normative form. It will be included only in desc, not in norm.

+Use/-Spell = Excluded in speller
+Use/SpellNoSugg = recognized but not suggested in speller +Use/NG only accept, not generate (for MT and Oahpa use) +Use/Circ for numerals, copied from sme

Compounds

+Cmp =
+Cmp/Hyph - on dynamic compounds that have a hyphen (in use?)

@P.Pmatch.Backtrack@

Used on single-token analyses; tell hfst-tokenise/pmatch to backtrack by reanalysing the substrings before and after this poin in the form (to find combinations of shorter analyses that would otherwise be missed)

Derivation

+Der =

+Der/minen =
+Der/s = deriving numerals

Tokeniser tags

+MWE = multiword expression, for tokenisation

+v1 =
+v2 =

Dialect tags

+Dial/-Var = Not Varanger
+Dial/-Por = Not Porsanger
+Dial/-Jok = Not Jokivarret
+Dial/Var Varanger, short for +Dial/-Jok+Dial/-Por
+Dial/Por Porsanger, short for +Dial/-Jok+Dial/-Var
+Dial/Jok Jokivarret, short for +Dial/-Por+Dial/-V

Stem variant tags

+v1: variant 1
+v2: variant 2
+v3: variant 3
+v4: variant 4
+v5: variant 5
+v6: variant 6
+v7: variant 7

Phonological symbols

i2 = plural i of nouns
i3 = past tense i of verbs
i4 = i in conditional isi of most verbs (without gemination)
i5 = superlative i of adjectives
i6 = i: j in poika: pojan
i7 = i in conditional of contract verbs (with gemination)

p2 = always p
t2 = always t, cf. katt2oma always tt, underlying -ts-
t3 = t participating in gradation, but not in t: s
k2 = always k
k3 = k3 never k: v, contrary to k
^A = Vowel harmony a/ä
^O = Vowel harmony o/ö
^U = Vowel harmony u/y
^V = Vowel copying
^N = tulˆNut, kävel^N^Ut
^E2I = for e to i change
^A2I = for a to i change
^I0 = i to 0 in vanha_a_21 -Por with i endings: tooline
^E0 = e to 0 in vanha_a_32 and vanha_n_32 bc we add b4 dial trigger, for twolc struc.
^HMETA = for h metathesis syksy - sykshyyn
^AO = a: 0
^A0 = a: o rannoissa
^WG = Weak grade matto - maton
^TJ = vuote vuoje
^T0 = tytär tyär tytärtä tyärtä in Var
^UU = vuote vuue
^TES = in use?
^VDEL = Deleting long vowel in rakkaa- > rakas
^EDEL = Deleting e in front of consonant
^AE = for a to e change
^M2N = for m to n in lumi lunta
^¤ = potecting against e: i word-finally (nalle, liike)
^Por -- Porsanger dialect
^Var -- Varanger dialect
^Jok -- Jokivarret dialect
^End -- End of word, since the # tags don't work properly

»
«
> (written with square brackets, see the root.lexc file)
< (written with square brackets, see the root.lexc file)

Flag diacritics

We have manually optimised the structure of our lexicon using following flag diacritics to restrict morhpological combinatorics - only allow compounds with verbs if the verb is further derived into a noun again:

@P.NeedNoun.ON@	(Dis)allow compounds with verbs unless nominalised
@D.NeedNoun.ON@	(Dis)allow compounds with verbs unless nominalised
@C.NeedNoun@	(Dis)allow compounds with verbs unless nominalised

For languages that allow compounding, the following flag diacritics are needed to control position-based compounding restrictions for nominals. Their use is handled automatically if combined with +CmpN/xxx tags. If not used, they will do no harm.

@P.CmpFrst.FALSE@	Require that words tagged as such only appear first
@D.CmpPref.TRUE@	Block such words from entering ENDLEX
@P.CmpPref.FALSE@	Block these words from making further compounds
@D.CmpLast.TRUE@	Block such words from entering R
@D.CmpSuff.TRUE@	Block such words from entering R
@P.CmpSuff.TRUE@	Mark that we have passed R
@D.CmpNone.TRUE@	Combines with the next tag to prohibit compounding
@U.CmpNone.FALSE@	Combines with the prev tag to prohibit compounding
@P.CmpOnly.TRUE@	Sets a flag to indicate that the word has passed R
@D.CmpOnly.FALSE@	Disallow words coming directly from root.

Use the following flag diacritics to control downcasing of derived proper nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use these flags. There exists a ready-made regex that will do the actual down-casing given the proper use of these flags.

@U.Cap.Obl@	Allowing downcasing of derived names: deatnulasj.
@U.Cap.Opt@	Allowing downcasing of derived names: deatnulasj.
@C.ErrOrth@
@D.ErrOrth.ON@
@P.ErrOrth.ON@

@U.pron.nom@	tbw
@U.pron.gen@	tbw
@U.pron.gen2@	tbw
@U.pron.ill@	tbw
@U.pron.par@	tbw
@U.pron.par2@	tbw
@U.pron.par3@	tbw
@U.pron.ess@	tbw
@U.pron.tra@	tbw
@U.pron.ine@	tbw
@U.pron.ela@	tbw
@U.pron.all@	tbw
@U.pron.ade@	tbw
@U.pron.abl@	tbw
@P.compound.block@	tbw
@D.compound.block@	tbw

Basic lexica, pointing to the other lexicon files

Here is the Root lexicon, pointing to all the parts of speech:

LEXICON Root

AdjectiveRoot ;
Adverb ;
Conjunction ;
Interjection ;
NUM ;
NounRoot ;
Particle ;
Postposition ;
Preposition ;
Pronoun ;
ProperNoun ;
Punctuation ;
Symbols ;
VerbRoot ;
Subjunction ;
Abbreviation ;
Acronym ;