Documenting the Greenlandic lexicon file
Introduction
File structure
The file format is documented in the Xerox manuals, especially in Karttunen 1993 Finite-State Lexicon Compiler, but see also the Beesley and Karttunen book. The file kal-lex.txt itself consists of a section defining Multichar_symbols, and of a large number of lexica, 170 lexica according to the present count (6.11.06). The file kal-lex.txt contains a.o. the continuation lexica for nouns and verbs, whereas the bulk of the stem lexicon is divided into different files, as indicated below.
In the kal-lex.txt file, the Multichar_Symbols section contains all grammatical tags, and all multicharacter members of the alphabet (the latter set is taken from the grammar file).
The Root lexicon points to the lexica of the different parts of speech: (for each sublexicon there is a pointer to the relevant file containing the sublexicon)
LEXICON Root
Nomen ; ! -> noun-kal-lex.txt
ateq ; ! -> ateq-kal-lex.txt (proper nouns)
Verbum ; ! -> verb-kal-lex.txt
Punctuation ! -> punct-kal-lex.txt
oqr ; ! -> prt-kal.lex.txt (particles, pronouns, adverbs)
Numeralier ; ! -> num-kal.lex.txt
Forkortelser ; ! -> abbr-kal.lex.txt
Akronymer ; ! -> acro-kal.lex.txt
The different part of speech lexica are documented here, in the order just given.
Nouns
The structure of the noun-kal.txt file
The file contains noun stems with pointers to the following continuation lexica:
1 29days
4 30days
7 31days
17 AblVb
181 K
16 K_plur
19 LokVb
2 marluk
1 Nomen
3 Num2morf
1 pingasut:arfineq%
370 SEQgemin
2083 tptmorf
17 TrmVb
2 TVschwa
4 Z1ateqZmorf
76 Z1eZmorf
1 Z1geoPZmorf
4 Z1iZmorf
1 Z1jaqZmorf
8 Z1+kaPZmorf
530 Z1+kaZmorf
2 Z1+koZmorf
5 Z1+laZmorf
1 Z1+leZ
4 Z1+leZmorf
8 Z1+loZmorf
6 Z1+maZmorf
1 Z1+meZmorf
1 Z1+ngZmorf
1 Z1nnguaqPZ
4 Z1nnguaqPZmorf
2 Z1nnguaqSZ
197 Z1nnguaqZmorf
6 Z1+nZmorf
889 Z1PZmorf
4 Z1+qaPZmorf
2 Z1+qaZ
299 Z1+qaZmorf
2 Z1+qeZmorf
18 Z1+qoZmorf
4 Z1+ssPZmorf
241 Z1+ssZmorf
57 Z1+tZmorf
26401 Z1Zmorf
210 Z2aqPZmorf
1059 Z2aqZmorf
2 Z2-ateqZmorf
2 Z2i2Z
3 Z2i2Zmorf
17 Z2kZmorf
5 Z2+lPZmorf
441 Z2+lZmorf
50 Z2-PZmorf
120 Z2+rZmorf
2 Z2veqZmorf
2784 Z2Zmorf
2273 Z2-Zmorf
6 ZkkutZ
63 ZkkutZmorf
16 ZoqseZmorf
2 ZoqsieZmorf
399 ZoqsZmorf
26 ZsaqZmorf
1 ZtiPZmorf
33 ZtiZmorf
The tag names have meaningful components:
- Z = nomen
- 1 = svag böjning, p-bøjning
- 2 = sterk, up-bøjning
- P = plurale tantum
- S = singularis
- - = sterk böying som trunkerer (2-)
- a, q, ... = gemineringer ved konsonantiske flexiver
- Z = nomen
- morf = går til derivasjonsleksika
- = går til flexivleksika
The ateq lexicon
The proper nouns are stored in the file gt/kal/src/ateq-kal-lex.txt.
The file structure
!
! Person vs. Geo
! Grønlandsk vs. udenlandsk
! Undenlandsk: Personnamn Geografiske namn
! -{eo}# Z1ateqpropZ Z1geopropZ
! andrevok vs. Z1ateqZmorf Z1geoSZmorf
! kons Zateq_oqsZmorf Z1geo_oqsZ
! Udenlandske pluralienavne
! -e Øerne
! -C ???
! Grønlandsk: Deklinationerne som vanlige subst.
!Z1geoSZmorf med morfemer til stednavne af p-boejningen
!Z1geoPZmorf med morfemer til stednavne af p-boejningen
!Z1geo_oqsZmorf som Finland
!Z2-geoSZmorf !Nuuk
!Z1ateqZmorf
!Z1nnguaq_ateqZmorf
!Z2-ateqZmorf
!Zateq_oqsZmorf
Verbs
The verb lexicon is stored in the verb-kal.txt file.
2 flex-iv
9 flex-iv_nngit
1 flex-iv_schwa
1 flex-tv
4 flex-tv_nngit
1 gallar-iv
13522 IV
86 IVschwa
7166 IV_voq
72 IV_voqP
57 K
15 K_plur
5551 TV
1 TVi_vaa
1966 TVschwa
1 TVschwaP
577 TV_vaa
6 XIgujoqX
4 XIiPXmorf
1 XIitX
5 XIi_voqXmorf
1422 XIiXmorf
468 XIPXmorf
1 XIPX_nngit
608 XItsXmorf
94 XItX
25 XIutePXmorf
127 XIuteXmorf
3 XTgujaaX
827 XTiXmorf
117 XTPXmorf
23 XTtX
4 XTutePXmorf
312 XTuteXmorf
Pronouns
Pronouns are found in the prt-kal-lex.txt file.
All Pronouns have the initial lexicon path Root -> Pronoun -> ...
Numerals
Indeclinable words
All the lexica for indeclinable words are made the same way:
Abbreviations
There is a file called abbr-kal-lex.txt. Work on abbreviations has not yet begun, the file contains just some dummy entries.
Last modified: $Date: 2007-06-14 13:53:51 +0300 (to, 14 kesä 2007) $, by $Author: boerre $

