root-morphology
Faroese morphological analyser
Definitions for Multichar_Symbols
Tags for POS
- +N +V +A +Adv +Prop +Num                 : Open POS's	 
- +CC +CS +Interj +Pr +Pron +IM		     : Closed POS's	 
- +Pers +Det +Refl +Recipr +Poss +Dem		 : Pron types	 
- +Nom +Acc +Gen +Dat					     : Case			 
- +Msc +Fem +Neu						     : Gender		 
- +Sg +Pl								     : Number		 
- +Def +Indef 						     : Definiteness	 
- +Cmp +Superl						     : Comparison	 
- +Prs +Prt							     : Tense		 
- +1Sg 					     : Person-Number 
- +2Sg 					     : Person-Number 
- +3Sg						     : Person-Number 
- +Inf +PrfPrc +PrsPrc +Sup +Imp +Sbj	     : Verb forms	 
- +Cmpnd								     : Compound		 
- +Abbr +ACR							     : Abbreviations, acronyms , 
- +CLB +PUNCT +LEFT +RIGHT			     : Punctuation, parentheses 
- +Symbol  : independent symbols in the text stream, like £, €, © 
- +CLBfinal Sentence final abbreviated expression ending in full stop, so that the full stop is ambiguous
- +Sg3 : This is inherited from common files, should be changed to +3Sg.
- +ABBR    sub-pos 
- +Arab sub-pos
- +Attr    sub-pos 
- +Coll sub-pos
- +Com     samiske kasus, skal bort 
- +Dyn     samiske kasus, skal bort 
- +Ela     samiske kasus, skal bort 
- +Ess     samiske kasus, skal bort 
- +Ill     samiske kasus, skal bort 
- +Ine samiske kasus, skal bort
- +MWE multiword expression
- +Pos     sjekk desse XXX 
- +Rom sjekk desse XXX
- +Der/heit Derivation with -heit
- +Ind +Pass +Interr +Ord
Semantic tags
- +Sem/Sur    
- +Sem/Mal    
- +Sem/Fem    
- +Sem/Plc    
- +Sem/Org    
- +Sem/Veh    
- +Sem/Fem
- +Sem/Year - year (i.e. 1000 - 2999), used only for numerals
- +Sem/Amount		        
- +Sem/Build		        
- +Sem/Build-room	        
- +Sem/Cat		        
- +Sem/Curr		        
- +Sem/Date		        
- +Sem/Domain		        
- +Sem/Domain_Hum	        
- +Sem/Dummytag	        
- +Sem/Edu_Hum	        
- +Sem/Event		        
- +Sem/Food-med	        
- +Sem/Group_Hum	        
- +Sem/Hum		        
- +Sem/ID			        
- +Sem/Lang		        
- +Sem/Mat		        
- +Sem/Measr		        
- +Sem/Money		        
- +Sem/Obj		        
- +Sem/Obj-el		        
- +Sem/Obj-ling	        
- +Sem/Org_Prod-audio     
- +Sem/Org_Prod-vis       
- +Sem/Part		        
- +Sem/Prod-vis	        
- +Sem/Route		        
- +Sem/Rule		        
- +Sem/Sign		        
- +Sem/State		        
- +Sem/State-sick	        
- +Sem/Substnc	        
- +Sem/Time		        
- +Sem/Time-clock	        
- +Sem/Tool-it	        
- +Sem/Txt
Non-changing letters
- a2  This is for a special a Umlaut case 
- g2 i2 j2 t2 v2
- +v1 +v2 : different paradigms ,
Triggers for Morphophonology
- %^UUML %^IUML %^eIUML %^ØUML				 : Umlaut types , 
- %^W %^JI 					                  : Cns changes , 
- %^EPH %^OEA 					 : Epenthesis,  , 
- %^GDEL %^GGDEL %^GVDEL %^VDEL %^JDEL %^RDEL	 : Cns deletion triggers, 
- %^EIO %^OA %^WVV %^EDH %^VSH			 : TODO , 
- %^AB1 %^AB2 %^AB3 %^AB4 %^AB5 %^AB6 %^AB7	 : Ablaut series , 
- %^aAB %^uAB 					 : More Ablaut , 
- %^NGKK						 : NG to KK	, 
- %^PASS : todo ,
- %> : Suffix boundary ,
- 
+v1 - Paradigm identifier (e.g. gera+v1 = ger)
- +v2 - Paradigm identifier (e.g. gera+v2 = gerar)
Language tags
- +OLang/ENG    
- +OLang/FIN    
- +OLang/NNO    
- +OLang/NOB    
- +OLang/RUS    
- +OLang/SMA    
- +OLang/SME    
- +OLang/SWE    
- +OLang/UND
Non-ascii letters, perhaps needed as multichar symbols
- æ ø å 				 
- á é í ó ú ý Á É Í Ó Ý 
- ä ö ü Ä Ö Ö
Compounding tags
The tags are of the following form: 
- 
+CmpNP/xxx - Normative (N), Position (P), ie the tag describes what- 
+CmpN/xxx - Normative (N) form ie the tag describes what - 
+Cmp/xxx - Descriptive compounding tags, ie tags that  describes 
This entry / word should be in the following position(s):
- 
+CmpNP/All - ... in all positions,  default, this tag does not have to be written 
- 
+CmpNP/First - ... only be first part in a compound or alone 
- 
+CmpNP/Pref - ... only  first part in a compound, NEVER alone 
- 
+CmpNP/Last - ... only be last part in a compound or alone 
- 
+CmpNP/Suff - ... only  last part in a compound, NEVER alone 
- 
+CmpNP/None - ... does not take part in compounds 
- 
+CmpNP/Only - ... only be part of a compound, i.e. can never 
Usage tags
- +Use/Disamb = Use only in disambiguator/tokeniser analyser 
- +Use/Circ = for compound restrictions
- +Use/-PMatch	        
- +Use/-Spell		        
- +Use/NG			        
- +Use/NGA		        
- +Use/SpellNoSugg
- +Err/Guess								 : Tag for Name Guesser component 
- +Err/Orth : Marking forms that are orthographical errors
Symbols that need to be escaped on the lower side (towards twolc):
- »7      : Literal »  
- «7      : Literal «
%[%>%] - Literal > %[%<%] - Literal <
Flag diacritics
We have manually optimised the structure of our lexicon using following 
| @P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised | 
| @D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised | 
| @C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised | 
Flags for speller suggestions
| @D.ErrOrth.ON@ | 
| @C.ErrOrth@ | 
| @P.ErrOrth.ON@ | 
Flag for case harmony in compounds
Set flag for compounds 
| @P.Case.MscNom@ | fyrstiflokkur | 
| @P.Case.MscObl@ | fyrstaflokk | 
| @P.Case.FemNom@ | lítlasystir | 
| @P.Case.FemObl@ | lítluusystur | 
| @P.Case.Neu@ | breiðaskarð | 
| @P.Case.Pl@ | fyrstuflokkar, lítlusystrar, breiðuskørð | 
Control flag values for compounds 
| @R.Case.MscNom@ | fyrstiflokkur | 
| @R.Case.MscObl@ | fyrstaflokk | 
| @R.Case.FemNom@ | lítlasystir | 
| @R.Case.FemObl@ | lítluusystur | 
| @R.Case.Neu@ | breiðaskarð | 
| @R.Case.Pl@ | fyrstuflokkar, lítlusystrar, breiðuskørð | 
Control flag values for compounds 
| @U.Case.MscNom@ | fyrstiflokkur | 
| @U.Case.MscObl@ | fyrstaflokk | 
| @U.Case.FemNom@ | lítlasystir | 
| @U.Case.FemObl@ | lítluusystur | 
| @U.Case.Neu@ | breiðaskarð | 
| @P.Pmatch.Loc@ | Location in string used or parsed by hfst-pmatch | 
| @P.Pmatch.Backtrack@ | Also for hfst-pmatch | 
Flags for compound restriction
For languages that allow compounding, the following flag diacritics are needed 
| @P.CmpFrst.FALSE@ | Require that words tagged as such only appear first | 
| @D.CmpPref.TRUE@ | Block such words from entering ENDLEX | 
| @P.CmpPref.FALSE@ | Block these words from making further compounds | 
| @D.CmpLast.TRUE@ | Block such words from entering R | 
| @D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding | 
| @U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding | 
| @P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R | 
| @D.CmpOnly.FALSE@ | Disallow words coming directly from root. | 
Use the following flag diacritics to control downcasing of derived proper 
| @U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj. | 
| @U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj. | 
Lexicon Root
- Nouns ;           
- Shortnouns ;      1- and 2-letter nouns excluded from compounding 
- Propernouns ;     
- Adjectives ;      
- Verbs ;		      
- Adverb ;	      
- Conjunction ;     
- Subjunction ;     
- Interjection ;    
- Numeral ;	      
- Determiner ;      
- Pronoun ;	      
- Preposition ;     
- Punctuation ;     
- Symbols     ;     
- Abbreviation ;    
- Acronyms ;
Lexicon Acronyms is split in two: 
- Acronym-fao ;  for fao acronyms 
- Acronym-smi ; for language independent acronums
Lexicon ENDLEX
@D.CmpOnly.FALSE@@D.CmpPref.TRUE@@D.NeedNoun.ON@ ENDLEX2 ;
The  @D.CmpOnly.FALSE@ flag diacritic is ued to disallow words tagged 

