sje
Contents:
- Free and Open source Pite Sami analyser giella-sje
- giella-sje
- Pite Sámi morphological analyser
- Multichar_Symbols definitions
- Key lexicon
- Lexicon ENDLEX
- Adjectives !
- File containing abbreviations
- Pite Saami Adjectives
- Adpositions
- Conjunctions
- Pite Saami Nouns
- Pite Saami numerals
- Pite Saami ProperNouns
- Punctuation
- Pite Saami Verbs
- Pite Sámi TWOLC file
- Rules
Free and Open source Pite Sami analyser giella-sje
- Authors
- Divvun and Giellatekno teams, community members
- Software version
- 2012
- Documentation license
- GNU GFDL
- SVN Revision
- $Revision
: 68217 $ - SVN Date
- $Date
: 2013-01-16 11: 31: 33 +0200 (Wed, 16 Jan 2013) $
giella-sje
This is free and open source Pite Sami morphology.
Pite Sámi morphological analyser
This file contains the tags and reference to main lexica
Multichar_Symbols definitions
POS
- +N Noun
- +V Verb
- +A Adjective
- +Adv Adverb
- +CC Coordinating conjuction
- +CS Subordinating conjuction
- +Interj Interjection
- +Pron Pronoun
- +Num Numeral
- +Pcle Particle
- +Po Postposition
- +Pr Preposition
Subclasses
- +Pers Personal
- +Dem Demonstrative
- +Interr Interrogative
- +Indef Indefinite
- +Refl Reflexive
- +Recipr Reciprocal
- +Rel Relative
- +NomAg Agent noun
- +Attr Attributive
- +Comp Comparative
- +Superl Superlative
Morphosyntactic properties
Verbal MSP
Tense-mode
- +Prs Present tense
- +Prt Preterite (past) tense
- +Ind Indicative mood
- +Imprt Imperative mood
- +Pot Potential mood
Person-number
- +Sg1 First person singular
- +Sg2 Second person singular
- +Sg3 Third person singular
- +Du1 First person dual
- +Du2 Second person dual
- +Du3 Third person dual
- +Pl1 First person plural
- +Pl2 Second person plural
- +Pl3 Third person plural
Infinite forms
- +Inf Infinitive
- +Neg Negation verb
- +ConNeg Connegative verb
- +GerI Gerund I
- +GerII Gerund II
- +PrfPrc Perfect participle
- +PrsPrc Present participle
- +VAbess Verb abessive
- +Cmp Compound
- +TV Transitive verb
- +IV Intransitive verb
Other tags
- +ABBR Abbreviation
- +Symbol = independent symbols in the text stream, like £, €, ©
- +Coll Collocation
- +Cmp/SgNom Compound component using Nominative Singular form
- +Cmp/SgGen Compound component using Genitive Singular form
- +Det Determiner
Derivation tags
- +Der/NomAg Derived agent noun
- +Der/Dimin Derived diminutive
- +Der/State Derived state noun
Nominal MSP
- +Sg Singular
- +Pl Plural
Case
- +Nom Nominative
- +Acc Accusative
- +Gen Genitive
- +Ill Illative
- +Ine Inessive
- +Ela Elative
- +Com Comitative
- +Ess Essive
- +Abe Abessive
- +Ord Ordinal
- +Card Cardinal
Semantic properties of names
Pssessive suffixes
- +PxSg1 First person singular possessive suffix
- +PxSg2 Second person singular possessive suffix
- +PxSg3 Third person singular possessive suffix
- +PxDu1 First person dual possessive suffix
- +PxDu2 Second person dual possessive suffix
- +PxDu3 Third person dual possessive suffix
- +PxPl1 First person plural possessive suffix
- +PxPl2 Second person plural possessive suffix
- +PxPl3 Third person plural possessive suffix
Other tags
- +Err/Orth Not part of standard orthography
- +Use/NG Found in reality, but not generated
- +Use/Circ
- +Cmp/Hyph
- +Cmp/SplitR
- +Use/-Spell
- +Use/NGminip
Compounding tags
The tags are of the following form:
-
+CmpNP/xxx - Normative (N), Position (P), ie. the tag describes what
-
+CmpN/xxx - Normative (N) form ie. the tag describes what
-
+Cmp/xxx - Descriptive compounding tags, ie. tags that describes
Normative/prescriptive compounding tags:
The first part of the component may be ..
- +CmpN/Sg = Singular
- +CmpN/SgN = Singular Nominative
- +CmpN/SgG = Singular Genitive
- +CmpN/PlG = Plural Genitive
- +CmpNP/All - ... be in all positions, default, this tag does not have to be written
- +CmpNP/First - ... only be first part in a compound or alone
- +CmpNP/Pref - ... only first part in a compound, NEVER alone
- +CmpNP/Last - ... only be last part in a compound or alone
- +CmpNP/Suff - ... only last part in a compound, NEVER alone
- +CmpNP/None - ... not take part in compounds
- +CmpNP/Only - ... only be part of a compound, i.e. can never
- +CmpN/SgLeft Singular to the left
- +CmpN/SgNomLeft Singular nominative to the left
- +CmpN/SgGenLeft Singular genitive to the left
- +CmpN/PlGenLeft Plural genitive to the left
-
+Cmp/Sg Singular
-
+Cmp/SgNom Singular Nominative
-
+Cmp/SgGen Singular Genitive
-
+Cmp/PlGen Plural Genitiv
-
+Cmp/PlNom Plural Nominative
-
+Cmp/Attr Attribute
-
+Cmp Dynamic compound - this tag should always be part of a
-
+Cmp/SplitR This is a split compound with the other part to the right:
-
+Cmp/SplitL This is a split compound with the other part to the left
- +Cmp/Sh testing ShCmp
Punctuation tags
- +CLB Clause boundary
- +PUNCT Punctuation
- +LEFT
- +RIGHT
- +SENT
Morphophonological symbols
Symbols for regulating the twolc file
^WG * weak grade
Archiphonemes
»7 * » «7 * « %[%>%] * > %[%<%] * <
Flag diacritics
@P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised |
@D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised |
@C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised |
For languages that allow compounding, the following flag diacritics are needed
@P.CmpFrst.FALSE@ | Require that words tagged as such only appear first |
@D.CmpPref.TRUE@ | Block such words from entering ENDLEX |
@P.CmpPref.FALSE@ | Block these words from making further compounds |
@D.CmpLast.TRUE@ | Block such words from entering R |
@D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding |
@U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding |
@P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R |
@D.CmpOnly.FALSE@ | Disallow words coming directly from root. |
Use the following flag diacritics to control downcasing of derived proper
@U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj. |
@U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj. |
Key lexicon
Lexicon ENDLEX
@D.CmpOnly.FALSE@@D.CmpPref.TRUE@@D.NeedNoun.ON@ # ;
The @D.CmpOnly.FALSE@ flag diacritic is ued to disallow words tagged
Adjectives !
Här är lexikon för adjektivsböjning.
- LEXICON A_EVEN_IES
- LEXICON A_EVEN_B
- LEXICON A_EVEN_D ex. almelatj: almelatja, nominativ och attributiv fungerar ej
- LEXICON A_EVEN_0 cs
- LEXICON A_EVEN_NOCG_S ex. jasska, állke
- LEXICON A_EVEN
- LEXICON A_EVEN_NOCG
- LEXICON COMPSUP
- LEXICON COMPSUP_A
- LEXICON COMPSUP_B
- LEXICON A_ODD är uddastaviga
- LEXICON A_ODD_GIS är uddastaviga ex.bavrek: bavreg- d
- LEXICON A_ODD_SIS är uddastaviga ex aset: ased-
- LEXICON A_ODD_AA är uddastaviga ex suddes.suddás-
- LEXICON A_ODD_S är uddastaviga ex blávvat: blávvad-/blávvis
- LEXICON A_ODD_Y är uddastaviga ex tjuavvgat: tjuavvgad-/tjuvvgis
- LEXICON A_ODD_Å är uddastaviga ex lusjgos: lusjgos-
- LEXICON A_ODD_Ä är uddastaviga ex stumbu: stumbus-
- LEXICON A_ODD_DD är uddastaviga ex gilos: gillus-
- LEXICON A_ODD_ÖÖ är uddastaviga ex rádes, rádep !kolla compsup
- LEXICON A_ODD_k_K är uddastaviga ex tjuorak: tjuorag-
- LEXICON A_EVEN_KONTR_C , exempel sjtänntjáj: sjtänntjá
- LEXICON A_EVEN_KONTR_D , exempel låmmsje: låmmsjá/låmmsjes (attr)
- LEXICON A_EVEN_KONTR_E , exempel mivkes: mivkká-/mivka (attr)
File containing abbreviations
Lexica for adding tags and periods
Splitting in 3 groups, because of the preprocessor
- LEXICON Abbreviation
-
LEXICON trab-ab-noun
-
LEXICON trab-ab-adj
-
LEXICON trab-ab-adv
-
LEXICON trab-ab-verb
-
LEXICON trab-ab-num
- LEXICON trab-ab-cc
-
LEXICON itrab-ab-noun
-
LEXICON itrab-ab-adj
-
LEXICON itrab-ab-adv
- LEXICON itrab-ab-num
-
LEXICON trnumab-ab-noun
- LEXICON trnumab-ab-adj
-
LEXICON ab-nodot-noun The bulk
- LEXICON ab-nodot-adj
- LEXICON ab-nodot-adv
- LEXICON ab-nodot-num
- LEXICON ab-nodot-cc
- LEXICON ab-nodot-verb
Intransitive abbreviations
- LEXICON ITRAB
- LEXICON TRNUMAB
Transitive abbreviations
- LEXICON TRAB
Pite Saami Adjectives
Adpositions
- LEXICON Postposition is the list
- LEXICON PrePostposition is the list
- LEXICON PostP adds the tag +Po
- LEXICON PrePost adds the tags +Po and +Pr
Adverbs
- LEXICON adv adds the tag +Adv
- LEXICON Adverb is the list
Conjunctions
- LEXICON CC gives +CC
- LEXICON Conjunction is the list.
- LEXICON interj gives the tag +Interj
- LEXICON Interjection is the list
Pite Saami Nouns
- LEXICON Noun is the main lexicon
Lexc inflectional classes (Mini-grammar)
- Even-syllable stem patterns:
- N_EVEN: bisyllabic stems except those ending in -o- (e.g. juällge, bijjla, gisstá, gällu, båsskå)
- N_EVEN_O: bisyllabic stems ending in -o- (e.g. iello)
- N_EVEN4: tetrasyllabic stems (trisyllabic in Nom.Sg) ending in -k/-g-, -tj- (e.g. mánnodak, såbmelatj)
- N_EVEN4_ISA: tetrasyllabic stems (trisyllabic in Nom.Sg) ending in -is/-as- (e.g. guoksagis)
- N_EVEN: bisyllabic stems except those ending in -o- (e.g. juällge, bijjla, gisstá, gällu, båsskå)
- Odd-syllable stem patterns:
- N_ODD: odd-syllable stems ending in a closed syllable and without consonant gradation (e.g. almatj)
- N_ODD_OPEN: odd-syllable stems ending in an open syllable (e.g. biena)
- N_ODD_VH: odd-syllable stems ending in a closed syllable and with vowel harmony (e.g. ålol)
- N_ODD_WG: odd-syllable stems ending in a closed syllable (e.g. vanas)
- N_ODD: odd-syllable stems ending in a closed syllable and without consonant gradation (e.g. almatj)
- Contracted stem patterns:
- N_CONTR_AJA: contracted stems ending in -aj or -a (e.g. ålmaj)
- N_CONTR_ESA: contracted stems ending in -es or -á (e.g. sarves)
- N_CONTR_OJU: contracted stems ending in -oj or -u (e.g. båtsoj)
- N_CONTR_OU: contracted stems ending in -o or -u (e.g. suolo)
- N_CONTR_AJA: contracted stems ending in -aj or -a (e.g. ålmaj)
Pite Saami numerals
- LEXICON Numeral
- LEXICON pcle the tag
- LEXICON Particle the list
Pronouns
- LEXICON Pronoun
- LEXICON Personal
- LEXICON perssg
- LEXICON persdu
- LEXICON perspl
- LEXICON Demonstrative
- LEXICON Determiner
- LEXICON Relative
- LEXICON Interrogative
- LEXICON Indefinita
- LEXICON Reflexive
- LEXICON gallacase
Pite Saami ProperNouns
Propernouns
- LEXICON ProperNoun
Punctuation
- LEXICON Punctuation
- LEXICON CS
- LEXICON Subjunction
Pite Saami Verbs
- LEXICON Verb is the main lexicon
Lexc inflectional classes (Mini-grammar)
- V_EVEN_E: even-syllable stems ending in -e- (e.g. båhtet)
- V_EVEN_A: even-syllable stems ending in -a- (e.g. dahkat)
- V_EVEN_O: even-syllable stems ending in -o- (e.g. viessot)
- V_EVEN_Å: even-syllable stems ending in -å- (e.g. bårråt)
- V_ODD: odd-syllable stems (e.g. ságastit)
- V_CONTR: contracted stems (e.g. gullit -j-, tjerrut -j-)
- lä: LE "copula/auxiliary verb" ;
- ij: IJ "negation verb" ;
Pite Sámi TWOLC file
- %^WG
- 0
: weak grade - %^G3
- 0
: marks grade three for stems w/o Cgrad - %^V2E2AA
- 0
: e to á in V2 (e.g. ILL.SG, DIM, 1/2-Sg): 0 * o to u in V2 (e.g. Ill.Sg, Dim, some N_ODD) etc. - %^CDEL
- 0
: delete final consonant odd (biednag) - %^VDEL
- 0
: delete final V2 vowel in compounds or gájk - %^MON
- 0
: Monophthong in contracted stems - %^UAUML
- 0
: uo to uä juolge / juällge - %^IEUML
- 0
: ie to ä, gielbar / gällbara: 0 * a to i, gallgat gillgin: 0 * e to i in front of Plural j and Sg Com: 0 * what is this?
Rules
Consonant gradation rules
Consonant Gradation for htt(j|s):ht(j|s)
Consonant Gradation for hxx:hx
Consonant Gradation for xdn(j):xn(j)
Consonant Gradation for xx:x
Consonant Gradation for xxy:xy
Consonant Gradation for xxt(j|s):xt(j|s)
Consonant Gradation for xxsj:xsj
Consonant Gradation for xy:y
Delete h in hx:y
Intervocalic voiced plosives in hx:y
Consonant Gradation for l/jbm:l/jm
Consonant Gradation for nnjg:njg
Consonant Gradation for vgŋ:vŋ
Consonant Gradation for rdj:rj
Other consonant rules
Final C Deletion
Final devoicing
Word Final Simplification in -st
Word-final De-Affricatization for tj
Vowel rules
metaphony
Default VH
Default VH for 4syllables
Default UA in G3
-
lu^Oddan^UAUMLi%>t
- luaddan0i0t
Special UÄ (VH) in G3
-
gu^Odde^G3%>t
- guädde00t
-
gu^Odde^G3%>t
- guädde00t
Special VH for u^O
Special VH for ie
-
sjievdnje^IJ%>s
- sj0evdnji00s
-
hierrge^WG^IJ%>j
- h0er0gi000j
-
hierrge%>j^V2E2AA
- h0ärrgá0j0
Ä in G3
Ä in G3 capitalized
V2 E to I before j-suffixes
V2 E to Á
-
båhte^WG%>v^V2E2AA
-
bå0dá00v0
-
båhte^WG%>^V2E2AA
-
bå0dá000
-
máhtte^WG%>v^V2E2AA
-
máht0á00v0
-
máhtte^WG%>^V2E2AA
- máht0á000
V2 E to Á before S or R
V2 O to U
Final V Deletion
-
a
- b