Meeting_2009-03-18
Dictionary tech meeting
Agenda:
- homonymy entries
- how to generate the correct paradigm
- how to generate the correct paradigm
- How to parametrise (what? make?) on the lg/dic level
- make a transducer containing all the dictionary entries
- improvement timetable
Homonymy entries
It has been solved in sme. How?
Now: adding special tags (xml node attribute) after the lemma. This attribute is added to the lexc entry, which ties the transducer to the specific lemma in the dictionary.
Systematic homonomy
Example:
Textword lohkki is ambiguous:
Nom | Gen | Norwegian | |
1. | lohkki | lohki | lokk |
2. | lohkki | lohkki | lesar |
We want to generate both paradigms, and bind each to the correct lexical entry.
xml:
1. <e> <lemma pos="n"> 2. <e> <lemma pos="actor">
All actors follow the same pattern. To generate the corresponding paradigms:
- lohkki+n+actor+sg+n
- lohkki+n+sg+n
The corresponding xml entries looks like:
1. <e usage="ped" src="nj"> <lg> <l pos="n">lohkki</l> <lc>lohkit</lc> </lg> <mg> <tg> <t pos="n">lokk</t> 2. <e usage="ped" src="nj"> <lg> <l pos="n" subclass="actor">lohkki</l> </lg> <mg> <tg> <t pos="m">leser</t>
We introduce the @subclass to denote inflectional subclasses, like proper nouns, actor inflection like the above
Random homonymy
In
- rett, rettar %[domstol%] (germ. a-stem )
- rett, retter %[matrett%] (germ. i-stem)
<e usage="ped" src="nj"> <lg> <l pos="n" subclass="m1">rett</l> <lc>lohkit</lc> </lg> <mg> <tg> <t pos="n">domstol</t> 2. <e usage="ped" src="nj"> <lg> <l pos="n" subclass="m2">rett</l> </lg> <mg> <tg> <t pos="m">matrett</t>
Here we use the inflection class as subclacc. That should uniquely identify the correct paradigm.
Improvement timetable
Friday 20.3. Then subversion.