Documenting the Greenlandic lexicon file
Introduction
File structure
The file format is documented in the Xerox manuals, especially in Karttunen 1993 Finite-State Lexicon Compiler, but see also the Beesley and Karttunen book. The file kal-lex.txt itself consists of a section defining Multichar_symbols, and of a large number of lexica, 170 lexica according to the present count (6.11.06). The file kal-lex.txt contains a.o. the continuation lexica for nouns and verbs, whereas the bulk of the stem lexicon is divided into different files, as indicated below.
In the kal-lex.txt file, the Multichar_Symbols section contains all grammatical tags, and all multicharacter members of the alphabet (the latter set is taken from the grammar file).
The Root lexicon points to the lexica of the different parts of speech: (for each sublexicon there is a pointer to the relevant file containing the sublexicon)
LEXICON Root Nomen ; ! -> noun-kal-lex.txt ateq ; ! -> ateq-kal-lex.txt (proper nouns) Verbum ; ! -> verb-kal-lex.txt Punctuation ! -> punct-kal-lex.txt oqr ; ! -> prt-kal.lex.txt (particles, pronouns, adverbs) Numeralier ; ! -> num-kal.lex.txt Forkortelser ; ! -> abbr-kal.lex.txt Akronymer ; ! -> acro-kal.lex.txt
The different part of speech lexica are documented here, in the order just given.
Nouns
The structure of the noun-kal.txt file
The file contains noun stems with pointers to the following continuation lexica:
1 29days 4 30days 7 31days 17 AblVb 181 K 16 K_plur 19 LokVb 2 marluk 1 Nomen 3 Num2morf 1 pingasut:arfineq% 370 SEQgemin 2083 tptmorf 17 TrmVb 2 TVschwa 4 Z1ateqZmorf 76 Z1eZmorf 1 Z1geoPZmorf 4 Z1iZmorf 1 Z1jaqZmorf 8 Z1+kaPZmorf 530 Z1+kaZmorf 2 Z1+koZmorf 5 Z1+laZmorf 1 Z1+leZ 4 Z1+leZmorf 8 Z1+loZmorf 6 Z1+maZmorf 1 Z1+meZmorf 1 Z1+ngZmorf 1 Z1nnguaqPZ 4 Z1nnguaqPZmorf 2 Z1nnguaqSZ 197 Z1nnguaqZmorf 6 Z1+nZmorf 889 Z1PZmorf 4 Z1+qaPZmorf 2 Z1+qaZ 299 Z1+qaZmorf 2 Z1+qeZmorf 18 Z1+qoZmorf 4 Z1+ssPZmorf 241 Z1+ssZmorf 57 Z1+tZmorf 26401 Z1Zmorf 210 Z2aqPZmorf 1059 Z2aqZmorf 2 Z2-ateqZmorf 2 Z2i2Z 3 Z2i2Zmorf 17 Z2kZmorf 5 Z2+lPZmorf 441 Z2+lZmorf 50 Z2-PZmorf 120 Z2+rZmorf 2 Z2veqZmorf 2784 Z2Zmorf 2273 Z2-Zmorf 6 ZkkutZ 63 ZkkutZmorf 16 ZoqseZmorf 2 ZoqsieZmorf 399 ZoqsZmorf 26 ZsaqZmorf 1 ZtiPZmorf 33 ZtiZmorf
The tag names have meaningful components:
- Z = nomen
- 1 = svag böjning, p-bøjning
- 2 = sterk, up-bøjning
- P = plurale tantum
- S = singularis
- - = sterk böying som trunkerer (2-)
- a, q, ... = gemineringer ved konsonantiske flexiver
- Z = nomen
- morf = går til derivasjonsleksika
- = går til flexivleksika
The ateq lexicon
The proper nouns are stored in the file gt/kal/src/ateq-kal-lex.txt.
The file structure
! ! Person vs. Geo ! Grønlandsk vs. udenlandsk ! Undenlandsk: Personnamn Geografiske namn ! -{eo}# Z1ateqpropZ Z1geopropZ ! andrevok vs. Z1ateqZmorf Z1geoSZmorf ! kons Zateq_oqsZmorf Z1geo_oqsZ ! Udenlandske pluralienavne ! -e Øerne ! -C ??? ! Grønlandsk: Deklinationerne som vanlige subst. !Z1geoSZmorf med morfemer til stednavne af p-boejningen !Z1geoPZmorf med morfemer til stednavne af p-boejningen !Z1geo_oqsZmorf som Finland !Z2-geoSZmorf !Nuuk !Z1ateqZmorf !Z1nnguaq_ateqZmorf !Z2-ateqZmorf !Zateq_oqsZmorf
Verbs
The verb lexicon is stored in the verb-kal.txt file.
2 flex-iv 9 flex-iv_nngit 1 flex-iv_schwa 1 flex-tv 4 flex-tv_nngit 1 gallar-iv 13522 IV 86 IVschwa 7166 IV_voq 72 IV_voqP 57 K 15 K_plur 5551 TV 1 TVi_vaa 1966 TVschwa 1 TVschwaP 577 TV_vaa 6 XIgujoqX 4 XIiPXmorf 1 XIitX 5 XIi_voqXmorf 1422 XIiXmorf 468 XIPXmorf 1 XIPX_nngit 608 XItsXmorf 94 XItX 25 XIutePXmorf 127 XIuteXmorf 3 XTgujaaX 827 XTiXmorf 117 XTPXmorf 23 XTtX 4 XTutePXmorf 312 XTuteXmorf
Pronouns
Pronouns are found in the prt-kal-lex.txt file.
All Pronouns have the initial lexicon path Root -> Pronoun -> ...
Numerals
Indeclinable words
All the lexica for indeclinable words are made the same way:
Abbreviations
There is a file called abbr-kal-lex.txt. Work on abbreviations has not yet begun, the file contains just some dummy entries.
Last modified: $Date: 2007-06-14 13:53:51 +0300 (to, 14 kesä 2007) $, by $Author: boerre $