140627

Meeting 27.6.2014

Present: Heli, Heiki-Jaan, Jaak, Sjur, Trond

  1. Status
  2. Tag discussion
  3. Plan a physical meeting
  4. Dictionary
  5. Oahpa
  6. Plan
  7. Next meeting

Status

We have a taglist draft.

Fst work has been slower due to midsummer etc. With --enable-oahpa some the hfst versions do not generate, for unknown reasons. But we do not use hfst for oahpa, som a possible solution is that it has not been set up. A long time goal is to have oahpa work for hfst as well.

The xfst/hfst instructions should be written in parallel in the Makefiles.

Tag discussion

  • script in the new tags in all lexc files, and in the yaml etc. test files. (Heiki-Jaan, then Heli)
  • Then have a newtag to plamk tag routine in src/tagsets/plamk.relabel (Sjur)
  • Then have a newtag to apertium tag routine in src/tagsets/apertium.relabel (apertium has <n> for +N etc) - this is already in place

Heli would be happy to have this within a week.

Plan a physical meeting

Not this time?

Dictionary

Heiki-Jaan did the work already, we have access to the 16669 lemma est-nor/nor-est, and may integrate it into nds. Sjur and Trond to look at the issue.

me+aatom
ge+-i
tp+2e
nn+atom
gn+-et

me+aatomi+elektri+jaam
ge+-a
tp+22u
nn+atomkraftverk
gn+-et --

Oahpa

Estonian Oahpa (Leksa) is online: http: //testing.oahpa.no/eesti/leksa/ Heli has done some work on clock.lexc. Other Numra automata also need some effort. Morfa-S is waiting for the tag conversion.

Plan

  • Dictionary: Sjur, Trond
  • Tags, as decided, next week (important for Oahpa progress)
  • Oahpa: when tags are done
  • FST: Work continues, mainly by Jaak
  • CG: the eternal question

Next meeting

August 12th 1300 Swedish time?

Appendix:The notes from the tag discussion

Tag conversion table:

+S        +N
+H        +N+Prop
+A        +A
+Num        +Num+Card
+Ord        +Num+Ord
+Pron        +Pron
+V        +V

(... see the documentation)

+prefix        +Pref        Stems used as a prefix. = compound?
+suffix        +Suf        Similar for suffixes?
G1        G1    These are multichar symbols for CG etc., for twolc.


(... see the documentation)

Heiki's suggestions for tag conversions:

+Num        +Num+Card
+Ord        +Num+Ord
+X          +Adv
+adit       +Ill
+G          +N+Gen

Bearing in mind that ordinals are adjectives, we would thus have:

+Num        +Num+Card
+Ord        +A+Ord - but ordinals do not have comparative and superlative - do all other adjectives have those? You could form comparative for a regular noun as well.. "majam" -- more house-like. Yes, but are there regular adjectives (nouns) that do not compare? Like many Swedish & Norwegian adjectives. can you give an example of a norwegian adjective? eksemplarisk - *eksemplariskere - *eksemplariskest
yes, actually there exist similar adjectives in Estonian, too. Exactly, and among them the ordinals ;)

kaksi        kaksi+Num+Card+Sg+Nom
toinen       toinen+Num+Ord+Sg+Nom
kolmas       kolmas+Num+Ord+Sg+Nom ==> +A+Ord
pari         pari+Num+Card+Sg+Nom
+N+Prop

Compare with http://nl.ijs.si/ME/Vault/V3/msd/html/

Tag sequence: +MainPOS +SubPOS

+prefix +Cmp +prefix +Prefix +suffix +Suffix

In the working document in est/doc/ plamktag<tab>newtag<tab>comments

When dust has settled, in root.lexc:

newtag !!≈ * @CODE@ comments plamktag
!! More comments.

+prefix, +suffix viker in vikerkaar viker is not an independent word, it is only used as a prefix.

kultur-        kultuvra+Sem/Domain+N+Cmp/SgNom+Cmp/SplitR
kultuvra       kultuvra+Sem/Domain+N+Sg+Nom
kultur         kultur        +?
kulturheasta   kultuvra+Sem/Domain+N+Cmp/SgNom+Cmp#heasta+Sem/Ani+Sem/Veh+N+Sg+Nom

Verbs

The first question in determining the tagset for verb categories is: what is explicitly expressed, what is underspecified, and what is ambiguous? In order to answer this, it is necessary to first have a full, possibly redundant description.

Below is the set of all the possible combinations of morphological categories for verbs. The categories are:

tegumood, aeg, kõneviis, isik, arv, kõnelaad
voice, tense, mood, person, number, aspect

The categories are given in the order in which the allomorphs (if they can be distinguished) that represent them are attached to the word stem (note that the treatment of allomorphs is sloppy here). The justification is that the categories are not equal, but form an hierarchy: those closer to the word end tend to be more optional, more often non-specified.

  1. voice: personal vs. impersonal (0-morph vs. t/d (aiu)), eg. elaks vs elataks, elav vs elatav
  2. tense: present vs. past (0-morph vs. s/si/nu), e.g. elan vs. elasin; elaks vs. elanuks
  3. mood: indicative vs. conditional vs imperative vs quotative (0-morph vs. ks vs. k/g(ue) vs. vat)
  4. person+number: notice that in personal present imperative 3rd person the distinction between singular/plural is lost (ta/nad elagu)
  5. aspect: affirmative vs. negative. The aspect manifests itself via lexical means: it is either present in an exceptional wordform (some forms of olema) or gets adhered to a form, normally used in affirmative aspect, from an immediately preceding word ei or ära (e.g. ei elaks). The only case when negative aspect has a dedicated form is impersonal present indicative negative (e.g. ei elata).

Below, brackets are used to list the set of non-specified alternative values.

 personal present indic 1s afirmative                               elan  
 personal present indic 2s afirmative                               elad  
 personal present indic 3s afirmative                               elab  
 personal present indic 1p afirmative                               elame 
 personal present indic 2p afirmative                               elate 
 personal present indic 3p afirmative                               elavad

 personal present indic (1s/2s/3s/1p/2p/3p) negative                ela, pole

 personal present condit 1s afirmative                              elaksin
 personal present condit 2s afirmative                              elaksid
 personal present condit (1s/2s/3s/1p/2p/3p) (afirmative/negative)  elaks 
 personal present condit 1p afirmative                              elaksime
 personal present condit 2p afirmative                              elaksite
 personal present condit 3p afirmative                              elaksid

 personal present condit 1s negative                                poleksin
 personal present condit 2s negative                                poleksid
 personal present condit (1s/2s/3s/1p/2p/3p) negative               poleks
 personal present condit 1p negative                                poleksime
 personal present condit 2p negative                                poleksite
 personal present condit 3p negative                                poleksid

 personal present imper 2s (afirmative/negative)                    ela
 personal present imper 3 (singular/plural) (afirmative/negative)   elagu 
 personal present imper 1p (afirmative/negative)                    elagem
 personal present imper 2p (afirmative/negative)                    elage 

 personal present imper 2s negative                                 ära
 personal present imper 3 (singular/plural) negative                ärgu
 personal present imper 1p negative                                 ärgem
 personal present imper 2p negative                                 ärge

 personal present quotat (afirmative/negative)                      elavat
 personal present quotat negative                                   polevat
 
 personal past indic 1s afirmative                                  elasin
 personal past indic 2s afirmative                                  elasid
 personal past indic 3s afirmative                                  elas  
 personal past indic 1p afirmative                                  elasime
 personal past indic 2p afirmative                                  elasite
 personal past indic 3p afirmative                                  elasid

 personal past indic (1s/2s/3s/1p/2p/3p) negative                   polnud

 personal past condit 1s afirmative                                 elanuksin
 personal past condit 2s afirmative                                 elanuksid
 personal past condit (1s/2s/3s/1p/2p/3p) (afirmative/negative)     elanuks
 personal past condit 1p afirmative                                 elanuksime
 personal past condit 2p afirmative                                 elanuksite
 personal past condit 3p afirmative                                 elanuksid

 personal past condit (1s/2s/3s/1p/2p/3p) negative                  polnuks

 personal past quotat (afirmative/negative)                         elanuvat
 personal past quotat negative                                      polnuvat

 personal present partic                                            elav
 personal past partic                                               elanud

 personal supine abessive                                           elamata
 personal supine elative                                            elamast
 personal supine illative                                           elama 
 personal supine inessive                                           elamas
 personal supine translative                                        elamaks

 impersonal present indic afirmative                                elatakse
 impersonal present indic negative                                  elata 
 
 impersonal present condit (afirmative/negative)                    elataks
 impersonal present imper (afirmative/negative)                     elatagu
 impersonal present quotat (afirmative/negative)                    elatavat

 impersonal present partic                                          elatav

 impersonal past indic afirmative                                   elati 
 impersonal past indic negative                                     poldud 

 impersonal past condit (afirmative/negative)                       elatuks

 impersonal past partic                                             elatud

 impersonal supine                                                  elatama

 gerund                                                             elades
 infinit                                                            elada

Exceptional cases:

 present personal (afirmative/negative), 3 words:  kuulukse, tunnukse, näikse
 negative                                                           ei ?

Analytical forms (olen elanud, olin elanud, oleksin elanud, ei olnud elanud, ei olnuks elanud etc) are not treated here...

Suggestions to simplify the way verb categories are combined above:

  1. Omit "personal" where person and number are specified anyway.
  2. If there is underspecification (afirmative/negative), leave it out.
  3. If there is underspecification (1s/2s/3s/1p/2p/3p), leave it out, but keep the tag for voice "personal"
  4. For personal present imper 3 (singular/plural), keep only personal present imper 3. In real texts, this form is often used as a general statement, without actually even specifying the subject, so it is impossible to disambiguate between singular and plural.
  5. If a combination contains "affirmative", leave "affirmative" out. This will simplify the output, although it thus fails to capture the fact that the corresponding wordform cannot be used with negative aspect, e.g. that "ei elan" is ungrammatical.