nob
Free and Open source Norwegian Bokmål analyser giella-nob
- Authors
- Divvun and Giellatekno teams, community members
- Software version
- 2012
- Documentation license
- GNU GFDL
- SVN Revision
- $Revision
: 68217 $ - SVN Date
- $Date
: 2013-01-16 11: 31: 33 +0200 (Wed, 16 Jan 2013) $
giella-nob
This is free and open source Norwegian Bokmål morphology.
Norwegian Bokmål morphological analyser !
- Multichar_Symbols
Part of speech
- +CLBfinal Sentence final abbreviated expression ending in full stop, so that the full stop is ambiguous
- +Symbol = independent symbols in the text stream, like £, €, ©
NDS analyser tags
Morphophonology
Todo: Document these
+Use/Circ circular string
+Der/AAdv Adjectives are also adverbs
Normativity and other usage tags
Paradigm generation
Tags for abbreviation handling
Semantic tags
Preprocessing
Symbols that need to be escaped on the lower side (towards twolc):
- »7
- Literal »
- «7
- Literal «
%[%>%] - Literal > %[%<%] - Literal <
Compounding
- +Cmp/Hyph -
- +CmpNP/None -
- +CmpNP/First -
Language codes
- +OLang/SME - North Sámi
- +OLang/SMJ - Lule Sámi
- +OLang/SMA - South Sámi
- +OLang/FIN - Finnish
- +OLang/SWE - Swedish
- +OLang/NOB - Norw. bokmål
- +OLang/NNO - Norw. nynorsk
- +OLang/ENG - English
- +OLang/RUS - Russian
- +OLang/UND - Undefined
Flag diacritics
Flags for ErrOrth
- @C.ErrOrth@ -
- @D.ErrOrth.ON@ -
- @P.ErrOrth.ON@ -
- @R.ErrOrth.ON@ -
Flags for compounding
We have manually optimised the structure of our lexicon using following
@P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised |
@D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised |
@C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised |
For languages that allow compounding, the following flag diacritics are needed
@P.CmpFrst.FALSE@ | Require that words tagged as such only appear first |
@D.CmpPref.TRUE@ | Block such words from entering ENDLEX |
@P.CmpPref.FALSE@ | Block these words from making further compounds |
@D.CmpLast.TRUE@ | Block such words from entering R |
@D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding |
@U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding |
@P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R |
@D.CmpOnly.FALSE@ | Disallow words coming directly from root. |
The tags are of the following form:
-
+CmpNP/xxx - Normative (N), Position (P), ie the tag describes what
-
+CmpN/xxx - Normative (N) form ie the tag describes what
-
+Cmp/xxx - Descriptive compounding tags, ie tags that describes
This entry / word should be in the following position(s):
-
+CmpNP/All - ... in all positions, default, this tag does not have to be written
-
+CmpNP/First - ... only be first part in a compound or alone
-
+CmpNP/Pref - ... only first part in a compound, NEVER alone
-
+CmpNP/Last - ... only be last part in a compound or alone
-
+CmpNP/Suff - ... only last part in a compound, NEVER alone
-
+CmpNP/None - ... does not take part in compounds
-
+CmpNP/Only - ... only be part of a compound, i.e. can never
Flags for governing initial capital
Use the following flag diacritics to control downcasing of derived proper
@U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj. |
@U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj. |
Flags for preprocessing
- @P.Pmatch.Backtrack@ -
- @PMATCH_BACKTRACK@ -
Basic lexica, pointing to the other lexicon files
- LEXICON Root
- FinalNoun ; for -skap etc. that is affix rather than compound
- ShortNounRoot ; 2- and 3-letter words
- NounRoot ; The rest
- ProperNoun ;
- AdjectivePrefix ;
- VerbRoot ;
- Adverb ;
- Subjunction ;
- Conjunction ;
- Preposition ;
- Interjection ;
- Pronoun ;
- Numeral ;
- Punctuation ;
- Symbols ;
- Abbreviation ;
- Acronym-smi ;
- +Use/NG: Nynorsk ; Accepts nno forms, does not generate
- LEXICON AdjectivePrefix -
- kjempe AdjectiveRoot ; -
- super AdjectiveRoot ; -
- AdjectiveRoot ; -
- LEXICON Abbreviation -
- Abbreviation-nob ; -
- Abbreviation-smi ; -
-
LEXICON ProperNoun
MorpAhophonological rules for Bokmål
Rule section
Change -er stem to -ar in Nynorsk
e Deletion
-
teaterX1>et
- teat0r0>et
-
modenX1>e
- mod0n0>e
-
reparere>Q3te
- reparer0>0te
-
*modenX1>e (is not standard language)
- *moden0>e (is not standard language)
-
hare>er
- har0>er
-
viktig>est
- viktig>0st
Consonant shortening before deletion
-
sikkerX1>e
- sik00r0>e
Geminate deletion in front of -t and -d
-
kalle>Q3te
- kal00>0te
-
lykk0esQ1
- lyk0tes0
-
all>Q3t
- al0>0t
-
bygge>Q3de
- byg00>0de
Delete foreign vowel
-
kollegaX2>er
- kolleg00>er
Delete r
Delete m
Umlaut
um Deletion 1
-
museumX5>er
- muse000>er
t weakening
-
oppskjørtetX6>e
- oppskjørted0>e
Double t deletion
-
svart>t
- svart>0
-
presentere%>Q3t
- presenter0>0t
Insert t in passives
Clitic after s-final
-
*a (is not standard language)
- *b (is not standard language)