Tag List

Tag conversion table:

+S		+N
+H		+N+Prop
+A		+A
+Num		+Num+Card
+Ord		+Num+Ord
+Pron		+Pron
+V		+V
+Adv		+Adv
+I		+Interj
+J		+CC/+CS
+G		+Attr
+K		+Pr/Po
+X		+Adv
+_ke		+Dim/ke
+_kene		+Dim/kene
+_us		+Der/us
+_lt		+Der/lt
+_ini		+Der/ini
+gi		+Foc/gi
+sg		+Sg
+pl		+Pl
+nom		+Nom
+gen		+Gen
+part		+Par
+ill		+Ill
+adit		+Ill
+in		+Ine
+el		+Ela
+all		+All
+ad		+Ade
+abl		+Abl
+tr		+Tra
+term		+Trm
+es		+Ess
+abes		+Abe
+kom		+Com
+comp		+Comp
+super		+Superl
+indic		+Ind
+imper		+Imprt
+cond		+Cond
+quot		+Quot
+pres		+Prs
+past		+Prt
+impf		+Impf
+sg+ps1		+Sg1
+sg+ps2		+Sg2
+sg+ps3 		+Sg3
+pl+ps1		+Pl1
+pl+ps2		+Pl2
+pl+ps3 		+Pl3
+neg		+Neg
+af 		+Aff
+imps		+Impers
+ps		+Pers
+sup		+Sup
+inf		+Inf
+ger		+Ger
+partic+pres+ps		+Pers+PrsPrc
+partic+pres+imps		+Impers+PrsPrc
+partic+past+ps		+Pers+PrfPrc
+partic+past+imps		+Pers+PrfPrc
+prefix		+Pref		Stems used as a prefix. = compound?
+suffix		+Suf		Similar for suffixes?
G1		G1    These are multichar symbols for CG etc., for twolc.
B1		B1
D1		D1
K1		K1
P1		P1
T1		T1
S1		S1
A1		A1
E1		E1
I1		I1
U1		U1
X2		X1
Q1		Q1

Heiki's suggestions for tag conversions:

+Num        +Num+Card
+Ord        +Num+Ord
+X          +Adv
+adit       +Ill
+G          +N+Sg+Gen

Bearing in mind that ordinals are adjectives, we would thus have:

+Num        +Num+Card
+Ord        +A+Ord ! Heli: but ordinals do not have comparative and superlative
                   ! Sjur: do all other adjectives have those?
                   ! Heli: You could form comparative for a regular noun as
                   !       well.. "majam" -- more house-like.
                   ! Sjur: Yes, but are there regular adjectives (nouns) that do
                   !       not compare? Like many Swedish & Norwegian
                   !       adjectives.
                   ! Heli: can you give an example of a norwegian adjective?
                   ! Sjur: eksemplarisk - *eksemplariskere - *eksemplariskest
                   ! Heli: yes, actually there exist similar adjectives in
                   !       Estonian, too.
                   ! Sjur: Exactly, and among them the ordinals ;)

kaksi        kaksi+Num+Card+Sg+Nom
toinen       toinen+Num+Ord+Sg+Nom
kolmas       kolmas+Num+Ord+Sg+Nom ==> +A+Ord
pari         pari+Num+Card+Sg+Nom
+N+Prop

There was a long discussion on whether Ordinals should be tagged +Num or +A. Compare with http://nl.ijs.si/ME/Vault/V3/msd/html/ and with Omorfi, which both suggest we use +Num.

Tag sequence: +MainPOS +SubPOS

Other tags that were discussed:

+prefix    +Cmp
+prefix    +Prefix
+suffix    +Suffix

In the working document in est/doc/:

plamktag<tab>newtag<tab>comments

When dust has settled, in root.lexc:

newtag !!≈ * @CODE@ comments plamktag
!! More comments.

Example of use:

+prefix, +suffix
viker in vikerkaar
viker is not an independent word, it is only used as a prefix.

Similar in Sámi:

kultur-        kultuvra+Sem/Domain+N+Cmp/SgNom+Cmp/SplitR
kultuvra       kultuvra+Sem/Domain+N+Sg+Nom
kultur         kultur        +?
kulturheasta   kultuvra+Sem/Domain+N+Cmp/SgNom+Cmp#heasta+Sem/Ani+Sem/Veh+N+Sg+Nom

Verbs

The first question in determining the tagset for verb categories is: what is explicitly expressed, what is underspecified, and what is ambiguous? In order to answer this, it is necessary to first have a full, possibly redundant description.

Below is the set of all the possible combinations of morphological categories for verbs. The categories are:

tegumood, aeg, kõneviis, isik, arv, kõnelaad
voice, tense, mood, person, number, aspect

The categories are given in the order in which the allomorphs (if they can be distinguished) that represent them are attached to the word stem (note that the treatment of allomorphs is sloppy here). The justification is that the categories are not equal, but form an hierarchy: those closer to the word end tend to be more optional, more often non-specified.

  1. voice: personal vs. impersonal 0-morph vs. t/da t/di t/du) eg. elaks vs elataks, elav vs elatav
  2. tense: present vs. past (0-morph vs. s/si/nu), e.g. elan vs. elasin; elaks vs. elanuks
  3. mood: indicative vs. conditional vs imperative vs quotative (0-morph vs. ks vs. kue/gue vs. vat)
  4. person+number: notice that in personal present imperative 3rd person the distinction between singular/plural is lost (ta/nad elagu)
  5. aspect: affirmative vs. negative. The aspect manifests itself via lexical means: it is either present in an exceptional wordform (some forms of olema) or gets adhered to a form, normally used in affirmative aspect, from an immediately preceding word ei or ära (e.g. ei elaks). The only case when negative aspect has a dedicated form is impersonal present indicative negative (e.g. ei elata).

Below, brackets are used to list the set of non-specified alternative values.

 personal present indic 1s afirmative                               elan  
 personal present indic 2s afirmative                               elad  
 personal present indic 3s afirmative                               elab  
 personal present indic 1p afirmative                               elame 
 personal present indic 2p afirmative                               elate 
 personal present indic 3p afirmative                               elavad

 personal present indic (1s/2s/3s/1p/2p/3p) negative                ela, pole

 personal present condit 1s afirmative                              elaksin
 personal present condit 2s afirmative                              elaksid
 personal present condit (1s/2s/3s/1p/2p/3p) (afirmative/negative)  elaks 
 personal present condit 1p afirmative                              elaksime
 personal present condit 2p afirmative                              elaksite
 personal present condit 3p afirmative                              elaksid

 personal present condit 1s negative                                poleksin
 personal present condit 2s negative                                poleksid
 personal present condit (1s/2s/3s/1p/2p/3p) negative               poleks
 personal present condit 1p negative                                poleksime
 personal present condit 2p negative                                poleksite
 personal present condit 3p negative                                poleksid

 personal present imper 2s (afirmative/negative)                    ela
 personal present imper (1s/2s/3s/1p/2p/3p) (afirmative/negative)   elagu 
 personal present imper 1p (afirmative/negative)                    elagem
 personal present imper 2p (afirmative/negative)                    elage 

 personal present imper 2s negative                                 ära
 personal present imper (1s/2s/3s/1p/2p/3p) negative                ärgu
 personal present imper 1p negative                                 ärgem
 personal present imper 2p negative                                 ärge

 personal present quotat (afirmative/negative)                      elavat
 personal present quotat negative                                   polevat
 
 personal past indic 1s afirmative                                  elasin
 personal past indic 2s afirmative                                  elasid
 personal past indic 3s afirmative                                  elas  
 personal past indic 1p afirmative                                  elasime
 personal past indic 2p afirmative                                  elasite
 personal past indic 3p afirmative                                  elasid

 personal past indic (1s/2s/3s/1p/2p/3p) negative                   polnud

 personal past condit 1s afirmative                                 elanuksin
 personal past condit 2s afirmative                                 elanuksid
 personal past condit (1s/2s/3s/1p/2p/3p) (afirmative/negative)     elanuks
 personal past condit 1p afirmative                                 elanuksime
 personal past condit 2p afirmative                                 elanuksite
 personal past condit 3p afirmative                                 elanuksid

 personal past condit (1s/2s/3s/1p/2p/3p) negative                  polnuks

 personal past imper (1s/2s/3s/1p/2p/3p) (afirmative/negative)      elanud 
 personal past imper (1s/2s/3s/1p/2p/3p) negative                   ärnud 

 personal past quotat (afirmative/negative)                         elanuvat
 personal past quotat negative                                      polnuvat

 personal present partic                                            elav
 personal past partic                                               elanud

 personal supine abessive                                           elamata
 personal supine elative                                            elamast
 personal supine illative                                           elama 
 personal supine inessive                                           elamas
 personal supine translative                                        elamaks

 impersonal present indic afirmative                                elatakse
 impersonal present indic negative                                  elata 
 
 impersonal present condit (afirmative/negative)                    elataks
 impersonal present condit negative                                 poldaks

 impersonal present imper (afirmative/negative)                     elatagu
 impersonal present imper negative                                  ärdagu

 impersonal present quotat (afirmative/negative)                    elatavat
 impersonal present quotat negative                                 poldavat

 impersonal present partic                                          elatav

 impersonal past indic afirmative                                   elati 
 impersonal past indic negative                                     poldud 

 impersonal past condit (afirmative/negative)                       elatuks

 impersonal past partic                                             elatud

 impersonal supine                                                  elatama

 gerund                                                             elades
 infinit                                                            elada

Exceptional cases:

 personal present (afirmative/negative), 3 words:  kuulukse, tunnukse, näikse
 negative                                                           ei ?

Analytical forms (olen elanud, olin elanud, oleksin elanud, ei olnud elanud, ei olnuks elanud etc) are not treated here...

Ways to simplify the way verb categories are combined above:

  1. If there is underspecification (afirmative/negative), leave it out.
  2. If there is underspecification (1s/2s/3s/1p/2p/3p), leave it out, but keep the tag for voice "personal"

Possible, but currently not used ways for simplification:

  1. Omit "personal" where person and number are specified anyway.
  2. If a combination contains "affirmative", leave "affirmative" out. This will simplify the output, although it thus fails to capture the fact that the corresponding wordform cannot be used with negative aspect, e.g. that "ei elan" is ungrammatical.