root-morphology

Lule Sámi morphological analyser

Definitions for Multichar_Symbols

Tags for POS

  • +N Noun
  • +A Adjective
  • +Adv Adverb
  • +V Verb
  • +Pron Pronouns
  • +CS Subjunction
  • +CC Conjunction
  • +Adp Adposition
  • +Po Postposition
  • +Pr Preposition
  • +Interj Interjection
  • +Pcle Particle
  • +Num Numeral
  • +TODO = Code for items that have not been modeled yet
  • +Prop Propernouns
  • +ACR Acronym
  • +Pers Personal pronoun
  • +Dem Demonstrative pronoun
  • +Interr Interrogative pronoun
  • +Refl reflexive pronoun
  • +Recipr reciprocal pronoun
  • +Rel relative pronoun
  • +Indef indefinite pronoun
  • +Coll collective numerals
  • +Arab arabic numerals
  • +Rom remertall
  • +Err/Orth Substandard. An ungrammatical, non-normative form of normative lemma.
  • +Err/Lex No normative lemma, often ungrammatical compounds like "bajásbuollda" and "songdebutierit".
  • +Err/Hyph No normative lemma
  • +Err/SpaceCmp No normative lemma
  • +Err/Der Lemmas that break with regular derivation rules, both morphologically and semantically
  • +Use/Marg Marginal, but normative lemmas. Not in speller.
  • +Use/-Spell Excluded from speller
  • +Use/-PLX Excluded from PLX speller
+Use/-PMatch Do not include in fst's made for hfst-pmatch
  • +Use/SpellNoSugg Recognized, but not suggested in speller
  • +Use/Circ Circular path
  • +Use/CircN Circular number path
  • +Use/Ped Remove from pedagogical speller
  • +Use/NG Do not generate, only for Oahpa and MT. In speller.
  • +Use/MT Generate for MT only, for restricting analyses needed
  • +Use/NGminip Not for miniparadigm in VD dicts
  • +Use/NotDNorm For words without formal normalization. Divvun suggest that this shouldn't be normative.
  • +Use/DNorm For words without formal normalization. Divvun suggest that this should be normative. Included in speller.
  • +Area/SE In Sweden
  • +Area/NO In Norway
  • +Dial/N Used in the northern areas. Some might say that these words are sme-words, but they are used by lulesamis in the northern part of the dialect area. Words like "válmas"
  • +Dial/S Used in the southern areas
  • +Dial/SH Short forms

Compounding tags

The tags are of the following form:

  • +CmpNP/xxx - Normative (N), Position (P), ie. the tag describes what position the tagged word can be in in a compound
  • +CmpN/xxx - Normative (N) form ie. the tag describes what form the tagged word should use when making compounds
  • +Cmp/xxx - Descriptive compounding tags, ie. tags that describes what form a word actually is using in a compound

Normative/prescriptive compounding tags: (to govern compound behaviour for the speller, ie. what a compound SHOULD BE)

The first part of the component may be ..

  • +CmpN/Sg = Singular
  • +CmpN/SgN = Singular Nominative
  • +CmpN/SgG = Singular Genitive
  • +CmpN/PlG = Plural Genitive
  • +CmpNP/All - ... be in all positions, default, this tag does not have to be written
  • +CmpNP/First - ... only be first part in a compound or alone
  • +CmpNP/Pref - ... only first part in a compound, NEVER alone
  • +CmpNP/Last - ... only be last part in a compound or alone
  • +CmpNP/Suff - ... only last part in a compound, NEVER alone
  • +CmpNP/None - ... not take part in compounds
  • +CmpNP/Only - ... only be part of a compound, i.e. can never be used alone, but can appear in any position
  • +CmpN/SgLeft Singular to the left
  • +CmpN/SgNomLeft Singular nominative to the left
  • +CmpN/SgGenLeft Singular genitive to the left
  • +CmpN/PlGenLeft Plural genitive to the left
  • +CmpN/Def Left override
  • +CmpN/DefSgGen Overrides left tag, requires SgGen form
  • +CmpN/DefPlGen Overrides left tag, requires PlGen form
  • +Cmp/Sg Singular
  • +Cmp/SgNom Singular Nominative
  • +Cmp/SgGen Singular Genitive
  • +Cmp/PlGen Plural Genitiv
  • +Cmp/PlNom Plural Nominative
  • +Cmp/Attr Attribute
  • +Cmp Dynamic compound - this tag should always be part of a dynamic compound. It is important for Apertium, and useful in other cases as well.
  • +Cmp/SplitR This is a split compound with the other part to the right: "Arbeids- og inkluderingsdepartementet" => Arbeids- = +Cmp/SplitR
  • +Cmp/SplitL This is a split compound with the other part to the left
  • +Cmp/Sh testing ShCmp
  • +Sg Singular number
  • +Du Dual number
  • +Pl Plural number
  • +Ess Essive case
  • +Nom Nominative case
  • +Gen Genitive case
  • +Acc Accusative case
  • +Ill Illative case
  • +Loc Locative case
  • +Com Comitative case
  • +Ine Inesive case
  • +Ela Elative case
  • +Par Partitive case
  • +Abe Abessive case
  • +PxSg1 possessive suffix singular first person
  • +PxSg2 possessive suffix singular second person
  • +PxSg3 possessive suffix singular third person
  • +PxDu1 possessive suffix dual first person
  • +PxDu2 possessive suffix dual second person
  • +PxDu3 possessive suffix dual third person
  • +PxPl1 possessive suffix plural first person
  • +PxPl2 possessive suffix plural second person
  • +PxPl3 possessive suffix plural plural person
  • +Comp Comparative comparison
  • +Superl Superlative comparison
  • +Attr Attribute
  • +Card
  • +Ord CHECK THIS! In closed-sme there are +Ord entries without circ. tag
  • +Ind Indicative mood
  • +Prs Present tense
  • +Prt Past tense
  • +Pot Potensial mood
  • +Cond conditional mood
  • +Imprt Imperative mood
  • +Sg1 singular first person
  • +Sg2 singular second person
  • +Sg3 singular third person
  • +Du1 dual first person
  • +Du2 dual second person
  • +Du3 dual third person
  • +Pl1 plural first person
  • +Pl2 plural second person
  • +Pl3 plural plural person
  • +Inf infinitive
  • +Ger gerundium
  • +ConNeg the main verb form used with negation verb. Like "bårå" in "Iv bårå guolev"
  • +Neg negation verb
  • +ImprtII second imperative mood
  • +PrsPrc present participle
  • +PrfPrc past participle
  • +Sup supinum
  • +VGen verb genitive
  • +VAbess verb abessive
  • +Actio Actio
  • +ABBR
  • +Symbol = independent symbols in the text stream, like £, €, ©
  • +ACR
  • +CLB
  • +PUNCT
  • +LEFT
  • +RIGHT
  • ^GUESSNOUNROOT
  • +TV
  • +IV Transitivity tags
  • +Multi Multiword phrase tag
  • +Guess for the name guesser
  • +NomAg Actor Noun From Verb - Nomen Agentis

Lexeme disambiguation tags

+Hom1
Homonymy
+Hom2
Homonymy

Stem variant tags

  • +v1 - variant 1
  • +v2 - variant 2
  • +v3 - variant 3
  • +v4 - variant 4
  • +v5 - variant 5

Question and Focus particles:

  • +Qst
  • +Clt
  • +Foc These two are only found in SMJ - do we need them?

Focus particles:

  • +Foc/ge
  • +Foc/gen
  • +Foc/ga
  • +Foc/Neg-k
  • +Foc/Pos-k

Other tags

  • +MWE multi word expressions, goes to abbr
  • +Sh Short form

Semantic tags to help disambiguation & syntactic analysis

These tags should always be located just before the POS tag.

  • +Sem/Act = Activity
  • +Sem/Adr = Webadr
  • +Sem/Amount = Amount
  • +Sem/Ani = Animate
  • +Sem/Aniprod = Animal Product
  • +Sem/Body = Bodypart
  • +Sem/Body-abstr = siellu, vuoig?a, jierbmi
  • +Sem/Build = Building
  • +Sem/Build-room = Room in a building, typically place to be
  • +Sem/Buildpart = Part of Bulding, like the closet
  • +Sem/Cat = Category
  • +Sem/Clth = Clothes
  • +Sem/Clth-jewl = Jewelery
  • +Sem/Clthpart = part of clothes, boallu, sávdnji...
  • +Sem/Ctain = Container
  • +Sem/Ctain-abstr = Abstract container like bank account
  • +Sem/Ctain-clth =
  • +Sem/Curr = Currency like dollár, Not Money
  • +Sem/Dance = Dance
  • +Sem/Date = Date
  • +Sem/Dir = Direction like GPS-kursa
  • +Sem/Domain = Domain like politics, reindeerherding (a system of actions)
  • +Sem/Drink = Drink
  • +Sem/Dummytag = Dummytag
  • +Sem/Edu = Educational event
  • +Sem/Event = Event
  • +Sem/Feat = Feature, like Árvu
  • +Sem/Feat-measr = Psychological feauture
  • +Sem/Feat-phys = Physiological feature, ivdni, fárda
  • +Sem/Feat-psych = Psychological feauture
  • +Sem/Fem = Female name
  • +Sem/Food = Food
  • +Sem/Food-med = Medicine
  • +Sem/Fruit = Fruit and fruit-like edibles
  • +Sem/Furn = Furniture
  • +Sem/Game = Game
  • +Sem/Geom = Geometrical object
  • +Sem/Group = Animal or Human Group
  • +Sem/Hum = Human
  • +Sem/Hum-abstr = Human abstract
  • +Sem/Ideol = Ideology
  • +Sem/Lang = Language
  • +Sem/Mal = Male name
  • +Sem/Mat = Material for producing things
  • +Sem/Measr = Measure
  • +Sem/Money = Has to do with money, like wages, not Curr(ency)
  • +Sem/Obj = Object
  • +Sem/Obj-catch =
  • +Sem/Obj-clo = Cloth
  • +Sem/Obj-cogn = Cloth
  • +Sem/Obj-el = (Electrical) machine or apparatus
  • +Sem/Obj-ling = Object with something written on it
  • +Sem/Obj-play = Play object
  • +Sem/Obj-rope = flexible ropelike object
  • +Sem/Obj-surfc = Surface object
  • +Sem/Org = Organisation
  • +Sem/Part = Feature, oassi, bealli
  • +Sem/Perc-cogn = Cloth
  • +Sem/Perc-emo = Emotional perception
  • +Sem/Perc-phys = Physical perception
  • +Sem/Perc-psych = Physical perception
  • +Sem/Plant = Plant
  • +Sem/Plantpart = Plant part
  • +Sem/Play = Play
  • +Sem/Plc = Place
  • +Sem/Plc-abstr = Abstract place
  • +Sem/Plc-elevate = Place
  • +Sem/Plc-line = Place
  • +Sem/Plc-water = Place
  • +Sem/Pos = Position (as in social position job)
  • +Sem/Process = Process
  • +Sem/Prod = Product
  • +Sem/Prod-audio = Audio product
  • +Sem/Prod-cogn = Cognition product
  • +Sem/Prod-ling = Linguistic product
  • +Sem/Prod-vis = Visual product
  • +Sem/Rel = Relation
  • +Sem/Route = Route
  • +Sem/Rule = Rule or convention
  • +Sem/Semcon = Semantic concept
  • +Sem/Sign = Sign (e.g. numbers, punctuation)
  • +Sem/Sport = Sport
  • +Sem/State =
  • +Sem/State-sick = Illness
  • +Sem/Substnc = Substance, like Air and Water
  • +Sem/Sur = Surname
  • +Sem/Symbol = Symbol
  • +Sem/Time = Time
  • +Sem/Time-clock = Time
  • +Sem/Tool = Prototypical tool for repairing things
  • +Sem/Tool-catch = Tool used for catching (e.g. fish)
  • +Sem/Tool-clean = Tool used for cleaning
  • +Sem/Tool-it = Tool used in IT
  • +Sem/Tool-measr = Tool used for measuring
  • +Sem/Tool-music = Music instrument
  • +Sem/Tool-write = Writing tool
  • +Sem/Txt = Text (girji, lávlla...)
  • +Sem/Veh = Vehicle
  • +Sem/Wpn = Weapon
  • +Sem/Wthr = The Weather or the state of ground
  • +Sem/Year = Year

Multiple Semantic tags:

  • +Sem/Ani_Cat
  • +Sem/Ani_Obj
  • +Sem/Act_Clth
  • +Sem/Act_Domain
  • +Sem/Act_Event
  • +Sem/Act_Feat-psych
  • +Sem/Act_Fruit
  • +Sem/Act_Group Activity and Group
  • +Sem/Act_Hum_Obj
  • +Sem/Act_Obj-play Activity and Object
  • +Sem/Act_Org
  • +Sem/Act_Plc A persons job is an activity, and a place as well
  • +Sem/Act_Prod-vis
  • +Sem/Act_Route Activity and Route, ie johtolat
  • +Sem/Act_Semcon
  • +Sem/Act_Time
  • +Sem/Act_Tool-it
  • +Sem/Act_Txt
  • +Sem/Amount_Build Amount and Building
  • +Sem/Amount_Semcon
  • +Sem/Ani-fish
  • +Sem/Ani_Body
  • +Sem/Ani_Body-abstr_Hum
  • +Sem/Ani_Build
  • +Sem/Ani_Buildpart
  • +Sem/Ani_Build_Hum_Txt
  • +Sem/Ani_Clth
  • +Sem/Ani_Feat_Hum
  • +Sem/Ani_Feat_Plant
  • +Sem/Ani_Group
  • +Sem/Ani_Group_Hum
  • +Sem/Ani_Group_Prod-vis
  • +Sem/Ani_Hum
  • +Sem/Ani_Hum_Plc
  • +Sem/Ani_Hum_Time
  • +Sem/Ani_Plc
  • +Sem/Ani_Plc_Txt
  • +Sem/Ani_Time
  • +Sem/Ani_Veh
  • +Sem/Aniprod_Hum
  • +Sem/Aniprod_Mat
  • +Sem/Aniprod_Obj
  • +Sem/Aniprod_Obj-clo
  • +Sem/Aniprod_Perc-phys
  • +Sem/Aniprod_Plant
  • +Sem/Aniprod_Plc
  • +Sem/Aniprod_Plc_Route
  • +Sem/Aniprod_Substnc_Wthr
  • +Sem/Body-abstr_Feat-psych
  • +Sem/Body-abstr_Prod-audio_Semcon
  • +Sem/Body_Body-abstr
  • +Sem/Body_Buildpart
  • +Sem/Body_Clth
  • +Sem/Body_Clthpart
  • +Sem/Body_Food
  • +Sem/Body_Fruit
  • +Sem/Body_Group_Hum
  • +Sem/Body_Group_Hum_Time
  • +Sem/Body_Hum
  • +Sem/Body_Mat
  • +Sem/Body_Measr
  • +Sem/Body_Obj_Tool-catch
  • +Sem/Body_Org
  • +Sem/Body_Plc
  • +Sem/Body_Plc-elevate
  • +Sem/Body_Plc_State
  • +Sem/Body_State
  • +Sem/Body_Time
  • +Sem/Buildpart_Ctain_Obj
  • +Sem/Buildpart_Plc
  • +Sem/Buildpart_Prod-audio
  • +Sem/Build_Buildpart
  • +Sem/Build_Clthpart
  • +Sem/Build_Edu_Org
  • +Sem/Build_Event_Org
  • +Sem/Build_Plc
  • +Sem/Build_Obj
  • +Sem/Build_Org
  • +Sem/Build_Plc
  • +Sem/Build_Route
  • +Sem/Build_Tool
  • +Sem/Build-room_Furn
  • +Sem/Cat_Edu
  • +Sem/Cat_Group_Hum
  • +Sem/Cat_Obj
  • +Sem/Clth-jewl_Curr
  • +Sem/Clth-jewl_Fruit
  • +Sem/Clth-jewl_Money
  • +Sem/Clth-jewl_Org
  • +Sem/Clth-jewl_Plant
  • +Sem/Clth_Obj
  • +Sem/Clth_Hum
  • +Sem/Clth_Sur
  • +Sem/Clthpart_Plc
  • +Sem/Ctain-abstr_Org
  • +Sem/Ctain-clth_Plant
  • +Sem/Ctain-clth_Veh
  • +Sem/Ctain_Feat-phys
  • +Sem/Ctain_Furn
  • +Sem/Ctain_Plc
  • +Sem/Ctain_Tool
  • +Sem/Ctain_Tool-measr
  • +Sem/Curr_Org
  • +Sem/Dance_Org
  • +Sem/Dance_Prod-audio
  • +Sem/Domain_Food-med
  • +Sem/Domain_Ideol
  • +Sem/Domain_Org
  • +Sem/Domain_Org_Plc-abstr
  • +Sem/Domain_Prod-audio
  • +Sem/Domain_Txt
  • +Sem/Drink_Plant
  • +Sem/Edu_Event
  • +Sem/Edu_Geom
  • +Sem/Edu_Group_Hum
  • +Sem/Edu_Mat
  • +Sem/Edu_Org
  • +Sem/Edu_Txt
  • +Sem/Event_Food
  • +Sem/Event_Hum
  • +Sem/Event_Plc
  • +Sem/Event_Plc-elevate
  • +Sem/Event_Time
  • +Sem/Feat_Hum
  • +Sem/Feat-measr_Plc
  • +Sem/Feat-phys_Hum
  • +Sem/Feat-phys_Obj
  • +Sem/Feat-phys_Tool-write
  • +Sem/Feat-phys_Veh
  • +Sem/Feat-phys_Wthr
  • +Sem/Feat-psych_Hum
  • +Sem/Feat-psych_Plc
  • +Sem/Feat_Plant
  • +Sem/Food_Perc-phys
  • +Sem/Food_Plant
  • +Sem/Food_Substnc
  • +Sem/Food_Time
  • +Sem/Game_Obj-play
  • +Sem/Geom_Obj
  • +Sem/Group_Hum
  • +Sem/Group_Hum_Org
  • +Sem/Group_Hum_Plc
  • +Sem/Group_Hum_Prod-vis
  • +Sem/Group_Org
  • +Sem/Group_Prod-vis_Txt_Veh
  • +Sem/Group_Sign
  • +Sem/Group_Txt
  • +Sem/Hum_Lang
  • +Sem/Hum_Lang_Plc
  • +Sem/Hum_Lang_Time
  • +Sem/Hum_Mat_Tool
  • +Sem/Hum_Obj
  • +Sem/Hum_Obj_Plc
  • +Sem/Hum_Org
  • +Sem/Hum_Part
  • +Sem/Hum_Plant
  • +Sem/Hum_Plc
  • +Sem/Hum_State
  • +Sem/Hum_Tool
  • +Sem/Hum_Tool-catch
  • +Sem/Hum_Veh
  • +Sem/Hum_Wthr
  • +Sem/Lang_Tool
  • +Sem/Lang_Tool-catch
  • +Sem/Mat_Obj
  • +Sem/Mat_Plant
  • +Sem/Mat_Part
  • +Sem/Mat_Plc
  • +Sem/Mat_Tool
  • +Sem/Mat_Tool-catch
  • +Sem/Mat_Txt
  • +Sem/Measr_Plc_Time
  • +Sem/Measr_Sign = Sign (e.g. numbers, punctuation)
  • +Sem/Measr_Time
  • +Sem/Money_Obj
  • +Sem/Money_Plc
  • +Sem/Money_Txt
  • +Sem/Obj-ling_Obj-surfc
  • +Sem/Obj_Part_Sign
  • +Sem/Obj-play
  • +Sem/Obj-play_Sport
  • +Sem/Obj_Plantpart
  • +Sem/Obj_Plc-abstr
  • +Sem/Obj_Prod-audio
  • +Sem/Obj_Semcon
  • +Sem/Obj_Sign
  • +Sem/Obj_State
  • +Sem/Obj_Tool-write
  • +Sem/Obj_Txt
  • +Sem/Obj_Veh
  • +Sem/Org_Plc
  • +Sem/Org_Prod-audio
  • +Sem/Org_Prod-vis
  • +Sem/Org_Prod-cogn
  • +Sem/Org_Rule
  • +Sem/Org_State
  • +Sem/Org_Txt
  • +Sem/Org_Veh
  • +Sem/Part_Plc
  • +Sem/Part_Prod-cogn
  • +Sem/Part_Substnc
  • +Sem/Part_Txt
  • +Sem/Perc-emo_Plc
  • +Sem/Perc-emo_Wthr
  • +Sem/Plant_Plantpart
  • +Sem/Plant_Time_Wthr
  • +Sem/Plant_Tool
  • +Sem/Plant_Tool-measr
  • +Sem/Plc-abstr_Rel_State
  • +Sem/Plc-abstr_Route
  • +Sem/Plc-abstr_Txt
  • +Sem/Plc_Pos
  • +Sem/Plc_Prod-audio
  • +Sem/Plc_Route
  • +Sem/Plc_State
  • +Sem/Plc_Substnc
  • +Sem/Plc_Substnc_Wthr
  • +Sem/Plc_Time
  • +Sem/Plc_Time_Wthr
  • +Sem/Plc_Tool-catch
  • +Sem/Plc_Txt
  • +Sem/Plc_Wthr
  • +Sem/Prod-audio_Substnc
  • +Sem/Prod-audio_Txt
  • +Sem/Prod-cogn_Txt
  • +Sem/Route_Txt
  • +Sem/Rule_Txt
  • +Sem/Semcon_Txt
  • +Sem/State-sick_Substnc
  • +Sem/Substnc_Wthr
  • +Sem/Time_Wthr
  • +Sem/Tool-music

Derivation tags

The following tags are used to describe the dynamic derivational system in Lule Sámi as encoded in this lexical description. The tags are classified according to a positional system, where each tag can be in one and only one position, and can only combine with tags from an earlier / lower position. This is done to avoid possible overgeneration in the derivational system.

+Der1 +Der2 +Der3 +Der4 +Der5
- positional tags, preceeds the actual der tag

Der#1 tags - tags in first position

  • +Der/PassL VV - long passive láhpeduvvat
  • +Der/PassS VV - Short passive láhpput
  • +Der/PassD VV - dallat passive
  • +Der/Dimin NN
  • +Der/adda VV
  • +Der/ahtja VV - only odd syll verbs take this der
  • +Der/ahttjá VV - only odd syll verbs take this der
  • +Der/Caus VV - previously Der/ahtte
  • +Der/alla VV
  • +Der/asste VV
  • +Der/d VV
  • +Der/dalla VV
  • +Der/dasste VV
  • +Der/Car NA - only even/contr, prev. Der/dibme
  • +Der/ferjak NA Adjectival -k der (from ?)
  • +Der/k NN / NA
  • +Der/l VV
  • +Der/ladda VV
  • +Der/lahtte VV
  • +Der/lasj NA - dont know, guess it Tronds, ojes, I see - is this ok?jes 2 Der: lasj Noun on 1472 Adj on 2040
  • +Der/lasj NN
  • +Der/lasste VV
  • +Der/n NA. Denominal -n adjective (similar t -k adj)
  • +Der/r VN - AA?
  • +Der/sasj NA
  • +Der/segak NA Adj. -k der from?
  • +Der/st VV
  • +Der/stahtte VV
  • +Der/stalla VV
  • +Der/stasste VV
  • +Der/tj VV
  • +Der/u/a/åd VV
  • +Der/A NA

Der#2 tags - tags in second position

  • +Der/dahtte VV
  • +Der/duhtte VV
  • +Der/ahkes VA
  • +Der/NomAct VN

Der#3 tags - tags in third position

  • +Der/duvva VV
  • +Der/InchL VV (previosuly Der/goahte)
  • +Der/mus VN
  • +Der/NomAct VN Realised in two different ways.
  • This realisation is Der3. Outcommented
  • to not define the tag twice, but kept
  • here for documentation purposes.
  • +Der/dahka VN
  • +Der/lis VA
  • +Der/NomAg VN

Der#4 tags - tags in fourth position

  • +Der/ahtes NA ! only odd

Der#5 tags - tags in fifth position

  • +Der/AAdv NA AAdv, previously +Der/at
  • +Der/vuota NA AN (tag harmonization: previosuly Der/vuohta)

Der#other tags - tags that can be in any position

There are no such tags in SMJ, but for symmetry and code coherence with SME the class is still kept.

Tags for originating language

The following tags are used to guide conversion to IPA: loan words and foreign names are usually pronounced (approximately) as in the originating (majority) language. Instead of trying to identify the correct pronunciation based on phonotactics (orthotactics actually), we tag all words that can't be correctly transcribed using the SME transcriber with source language codes. Once tagged, it is possible to split the lexical transducer in smaller ones according to langu- age, and apply different IPA conversion to each of them. The principle of tagging is that we only tag to the extent needed, and following a priority:

  1. any untagged word is pronounced with SME orthographic conventions
  2. NNO and NOB have identical pronunciation, NNO is only used if different in spelling from NOB
  3. SWE has mostly the same pronunciation as NOB, and is only used if different in spelling from NOB
  4. Occasionally even SME (the default) may be tagged, to block other languages from being specified, mainly during semi-automatic language tagging sessions All in all, we want to get as much correctly transcribed to IPA with as little work as possible. On the other hand, if more words are tagged than strictly needed, this should pose no problem as long as the IPA conversion is correct - at least some words will get the same pronunciation whether read as SME or NOB/NNO/SWE.
  • +OLang/SME - North Sámi
  • +OLang/SMA - South Sámi
  • +OLang/FIN - Finnish
  • +OLang/SWE - Swedish
  • +OLang/NOB - Norw. bokmål
  • +OLang/NNO - Norw. nynorsk
  • +OLang/ENG - English
  • +OLang/RUS - Russian
  • +OLang/UND - Undefined

Flag diacritics

We have manually optimised the structure of our lexicon using following flag diacritics to restrict morhpological combinatorics - only allow compounds with verbs if the verb is further derived into a noun again:

@P.NeedNoun.ON@ (Dis)allow compounds with verbs unless nominalised
@D.NeedNoun.ON@ (Dis)allow compounds with verbs unless nominalised
@C.NeedNoun@ (Dis)allow compounds with verbs unless nominalised

For languages that allow compounding, the following flag diacritics are needed to control position-based compounding restrictions for nominals. Their use is handled automatically if combined with +CmpN/xxx tags. If not used, they will do no harm.

@P.CmpFrst.FALSE@ Require that words tagged as such only appear first
@D.CmpPref.TRUE@ Block such words from entering ENDLEX
@P.CmpPref.FALSE@ Block these words from making further compounds
@D.CmpLast.TRUE@ Block such words from entering R
@D.CmpNone.TRUE@ Combines with the next tag to prohibit compounding
@U.CmpNone.FALSE@ Combines with the prev tag to prohibit compounding
@U.CmpNone.TRUE@ Combines with the two previous ones to block compounding
@P.CmpOnly.TRUE@ Sets a flag to indicate that the word has passed R
@D.CmpOnly.FALSE@ Disallow words coming directly from root.
@U.CmpHyph.FALSE@ Flag to control hyphenated compounds like proper nouns
@U.CmpHyph.TRUE@ Flag to control hyphenated compounds like proper nouns
@C.CmpHyph@ Flag to control hyphenated compounds like proper nouns

Use the following flag diacritics to control downcasing of derived proper nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use these flags. There exists a ready-made regex that will do the actual down-casing given the proper use of these flags.

@U.Cap.Obl@ Allowing downcasing of derived names: deatnulasj.
@U.Cap.Opt@ Allowing downcasing of derived names: deatnulasj.
@P.Px.add@ Giving possibility for Px-suffixes (all except from Nom 3.p)
@R.Px.add@ Requiring P.Px.add-flag for Px-suffixes (all except from Nom 3.p)
@P.Nom3Px.add@ Giving possibility for Px-suffixes Nom 3.p
@R.Nom3Px.add@ Requiring P.Nom3Px.add flag for Px-suffixes Nom 3.p
  • LEXICON Acronym
  • LEXICON ProperNoun

Lexicon ENDLEX

And this is the ENDLEX of everything:

 @D.CmpOnly.FALSE@@D.CmpPref.TRUE@@D.NeedNoun.ON@ # ;

The @D.CmpOnly.FALSE@ flag diacritic is ued to disallow words tagged with +CmpNP/Only to end here. The @D.NeedNoun.ON@ flag diacritic is used to block illegal compounds.