Meeting_2012-08-23

PLX Meeting

Participants: Tomi, Thomas, Sjur

  • new infra and PLX
  • issues in PLX - overview
  • plan forward to a new release

New infra and PLX

Should we move smj and sma to new infra?

Benefits:

  • better support for systematic and automatic testing
  • more structured code

Risks and drawbacks:

  • we risk delaying the new speller even more

Risk mitigation: we don't move sme, so if we can't get PLX conversion in the new infra to work properly, we just won't update smj and sma now. They will have to wait a bit, after Inga Lill and Maja has worked more on the material.

Issues in PLX - overview

Status: Thomas has marked some derivations +Use/-Spell, and instead lexicalized:

+Der/t
+Der/ár
+Der/huhtti
+Der/huvva
+Der/stuvva
+Der/viđá
+Der/supmi

+Der/veara is marked +Err/Sub and a closed class of words now instead makes two word phrases with Po/Attr "veara"

Next aim is to have a look at LEXICON NAMAT as well to see if the similar trimming can be done here.

Tomi has started a new sma speller compilation today - filter compilation takes a lot of time the first time.

Main issue: compounding does not work as intended for a number of constructions.

Additional issue: some word forms are not accepted in the regular speller, even though they are in the test speller. We try to correct this by making the full speller not so huge by restricting some processes (mainly derivations, see above), and by modelling some derivations as PLX compounds instead of derivations (= generated full forms in the PLX format, e.g. -vuohta).

Vuohta

-vuohta can be added to all adjectives, but not all nouns. Further compounding of vuohta- needs to use the genitive form -vuođa-.

But when vuohta is modelled as a compound, the PLX flags of the adjective will influence the total compound, and vuohta won't behave as specified.

We already have 1039 words on -vuohta in our lexicon, ie lexicalised derivations, 353 compounds with -vuođa-. Is this enough to cover the most frequent cases of -vuohta, such that we can turn off -vuohta altogether in the speller (and lexicalise the most frequent missing ones)?

We remove -vuohta as a derivation completely from the speller, and instead lexicalise the most frequent ones.

Plan forward to a new release

1. check if we can get a full speller that behaves as the test speller 2. if they behave the same, use the PLX conversion test bench to work out the remaining issues 3. when done, release