Documenting the Greenlandic lexicon file

Introduction

File structure

Note
This documentation is obsolete, and goes back to the old infrastructure. It is kept here for now only since it may give some background on the analyser.

The file format is documented in the Xerox manuals, especially in Karttunen 1993 Finite-State Lexicon Compiler, but see also the Beesley and Karttunen book. The file kal-lex.txt itself consists of a section defining Multichar_symbols, and of a large number of lexica, 170 lexica according to the present count (6.11.06). The file kal-lex.txt contains a.o. the continuation lexica for nouns and verbs, whereas the bulk of the stem lexicon is divided into different files, as indicated below.

In the kal-lex.txt file, the Multichar_Symbols section contains all grammatical tags, and all multicharacter members of the alphabet (the latter set is taken from the grammar file).

The Root lexicon points to the lexica of the different parts of speech: (for each sublexicon there is a pointer to the relevant file containing the sublexicon)

LEXICON Root
Nomen ;         ! -> noun-kal-lex.txt
ateq ;          ! -> ateq-kal-lex.txt (proper nouns)
Verbum ;        ! -> verb-kal-lex.txt
Punctuation     ! -> punct-kal-lex.txt
oqr ;           ! -> prt-kal.lex.txt (particles, pronouns, adverbs)
Numeralier ;    ! -> num-kal.lex.txt
Forkortelser ;  ! -> abbr-kal.lex.txt
Akronymer ;     ! -> acro-kal.lex.txt
      

The different part of speech lexica are documented here, in the order just given.

Nouns

The structure of the noun-kal.txt file

The file contains noun stems with pointers to the following continuation lexica:

      1 29days
      4 30days
      7 31days
     17 AblVb
    181 K
     16 K_plur
     19 LokVb
      2 marluk
      1 Nomen
      3 Num2morf
      1 pingasut:arfineq%
    370 SEQgemin
   2083 tptmorf
     17 TrmVb
      2 TVschwa
      4 Z1ateqZmorf
     76 Z1eZmorf
      1 Z1geoPZmorf
      4 Z1iZmorf
      1 Z1jaqZmorf
      8 Z1+kaPZmorf
    530 Z1+kaZmorf
      2 Z1+koZmorf
      5 Z1+laZmorf
      1 Z1+leZ
      4 Z1+leZmorf
      8 Z1+loZmorf
      6 Z1+maZmorf
      1 Z1+meZmorf
      1 Z1+ngZmorf
      1 Z1nnguaqPZ
      4 Z1nnguaqPZmorf
      2 Z1nnguaqSZ
    197 Z1nnguaqZmorf
      6 Z1+nZmorf
    889 Z1PZmorf
      4 Z1+qaPZmorf
      2 Z1+qaZ
    299 Z1+qaZmorf
      2 Z1+qeZmorf
     18 Z1+qoZmorf
      4 Z1+ssPZmorf
    241 Z1+ssZmorf
     57 Z1+tZmorf
  26401 Z1Zmorf
    210 Z2aqPZmorf
   1059 Z2aqZmorf
      2 Z2-ateqZmorf
      2 Z2i2Z
      3 Z2i2Zmorf
     17 Z2kZmorf
      5 Z2+lPZmorf
    441 Z2+lZmorf
     50 Z2-PZmorf
    120 Z2+rZmorf
      2 Z2veqZmorf
   2784 Z2Zmorf
   2273 Z2-Zmorf
      6 ZkkutZ
     63 ZkkutZmorf
     16 ZoqseZmorf
      2 ZoqsieZmorf
    399 ZoqsZmorf
     26 ZsaqZmorf
      1 ZtiPZmorf
     33 ZtiZmorf
          

The tag names have meaningful components:

  • Z = nomen
  • 1 = svag böjning, p-bøjning
  • 2 = sterk, up-bøjning
  • P = plurale tantum
  • S = singularis
  • - = sterk böying som trunkerer (2-)
  • a, q, ... = gemineringer ved konsonantiske flexiver
  • Z = nomen
  • morf = går til derivasjonsleksika
  • = går til flexivleksika

The ateq lexicon

The proper nouns are stored in the file gt/kal/src/ateq-kal-lex.txt.

The file structure

!
! Person vs. Geo
! Grønlandsk vs. udenlandsk
  ! Undenlandsk:    Personnamn      Geografiske namn
    ! -{eo}#        Z1ateqpropZ     Z1geopropZ
    ! andrevok vs.  Z1ateqZmorf     Z1geoSZmorf
    ! kons          Zateq_oqsZmorf  Z1geo_oqsZ
  ! Udenlandske pluralienavne
    ! -e Øerne
    ! -C ???
  ! Grønlandsk: Deklinationerne som vanlige subst.

 !Z1geoSZmorf med morfemer til stednavne af p-boejningen
 !Z1geoPZmorf med morfemer til stednavne af p-boejningen
 !Z1geo_oqsZmorf som Finland
 !Z2-geoSZmorf !Nuuk
 !Z1ateqZmorf
 !Z1nnguaq_ateqZmorf
 !Z2-ateqZmorf
 !Zateq_oqsZmorf

        

Verbs

The verb lexicon is stored in the verb-kal.txt file.

      2 flex-iv
      9 flex-iv_nngit
      1 flex-iv_schwa
      1 flex-tv
      4 flex-tv_nngit
      1 gallar-iv
  13522 IV
     86 IVschwa
   7166 IV_voq
     72 IV_voqP
     57 K
     15 K_plur
   5551 TV
      1 TVi_vaa
   1966 TVschwa
      1 TVschwaP
    577 TV_vaa
      6 XIgujoqX
      4 XIiPXmorf
      1 XIitX
      5 XIi_voqXmorf
   1422 XIiXmorf
    468 XIPXmorf
      1 XIPX_nngit
    608 XItsXmorf
     94 XItX
     25 XIutePXmorf
    127 XIuteXmorf
      3 XTgujaaX
    827 XTiXmorf
    117 XTPXmorf
     23 XTtX
      4 XTutePXmorf
    312 XTuteXmorf
        

Pronouns

Pronouns are found in the prt-kal-lex.txt file.

All Pronouns have the initial lexicon path Root -> Pronoun -> ...

Numerals

Indeclinable words

All the lexica for indeclinable words are made the same way:

Abbreviations

There is a file called abbr-kal-lex.txt. Work on abbreviations has not yet begun, the file contains just some dummy entries.

Last modified: $Date: 2007-06-14 13:53:51 +0300 (to, 14 kesä 2007) $, by $Author: boerre $