Skolt Saami Dictionary Features
Contents:
- Lexicon files
-
Entry structure
- <e /> level
- <mg /> level
- <tg /> level
- <t /> - a word
- <t /> - a phrase
- <te /> - An explanation:a sentence which explains the meaning of a word, but can't be used in the translation.
- <re /> - Restriction
- <l /> attribute documentation
- <t /> attribute documentation
- <lemma_ref /> for references
- <xg /> Example sentences
- Files with static paradigms
- Other files
- Generated miniparadigms
- Pregenerated paradigms
- Homonymous entries
Lexicon files
Lexicon files are a part of the langs/sms/src/morphology infrastructure.
Entry structure
<e /> level
TODO: example sms entry
<mg /> level
Man skiller mellom synonymer og meningsgrupper. Synonymer har samme <mg>
<tg /> level
Elementet <mg> inneholder en eller flere <tg> (oversettelsesgruppe eller translation group) som igjen kan inneholde:
<t /> - a word
TODO: example entry with <t />
<t /> - a phrase
TODO: example entry with <t />
<te /> - An explanation:a sentence which explains the meaning of a word, but can't be used in the translation.
TODO: example entry with <t />
<re /> - Restriction
- <re> gives a restriction for the translation, f.ex. norwegian vest has the restriction of clothes, to separate it from the navigational direction.
TODO: example entry with <t />
<l /> attribute documentation
TODO:
<t /> attribute documentation
TODO:
<lemma_ref /> for references
<lemma_ref />is used to display a reference in the dictionary to another
Typically these words also include an <analysis /> node in the <lg /> so we can
<e> <lg> <l pos="Pron" type="Pers">muʹnne</l> <lemma_ref lemmaID="mon_Pron_Pers">mon</lemma_ref> <analysis>Pron_Pers_Sg_Ill</analysis> </lg>
Leads to ...
<e> <lg> <l pos="Pron" type="Pers">mon</l> </lg>
These are found in Pron_references_sms2x.xml.
<xg /> Example sentences
TODO:
In sms these can come in under either <mg /> or <tg />, for good reasons.
TODO: example of reasons
Files with static paradigms
Currently all sms files have a minimal miniparadigm, but in NDS we generate more.
In NDS we can tell the system to not use the static miniparadigm with the @exclude attribute:
<mini_paradigm exclude="NDS"> <analysis ms="Pron_Pers_Sg1_Gen"><wordform>muu</wordform></analysis> <analysis ms="Pron_Pers_Sg1_Ill"><wordform>muʹnne</wordform></analysis> </mini_paradigm>
If this attribute is not present as in the above, then the static paradigm will
Other files
TODO:
Generated miniparadigms
Miniparadigms are generated in lexicon entries in order to help users. They
Use/NGminip og Allegro i lexc
TODO: are these the tags we use now in sms?
+Use/NGminip - remove inflectional forms that one does not want to present in
NB: judicious use of +Use/NGminip from sme to clean up many possibilities into
Inflection | Without +Use/NGminip | With +Use/NGminip |
---|---|---|
A+Sg+Nom | heittot | heittot |
A+Attr | heittogis heittohis (bivttas) | heittogis (bivttas) |
A+Pl+Nom | heittogat heittohat | heittogat |
A+Comp+Attr | heittogit heittogut heittoget heittogat heittohit heittohut heittohet heittohat | heittoget heittogat |
A+Comp+Sg+Nom | heittogit heittogut heittoget heittogeabbo heittogat heittogabbo heittohit heittohut heittohet heittoheabbo heittohat heittohabbo | heittogeabbo heittogabbo |
A+Superl+Sg+Nom | heittogeamos heittogamos heittoheamos heittohamos | heittogeamos heittogamos |
Nouns
Display the whole paradigm in two columns for plural. In NDS, because there
TODO: Noun attributes that affect miniparadigms ?
Bøyning | Eksempel |
---|---|
Sg+Nom | võrr |
Sg+Gen | võõr |
Sg+Acc | võõr |
Sg+Ill | võʹrre |
Sg+Loc | võõrâst |
Sg+Com | võõrin |
Sg+Abe | võõrtää |
Sg+Abe | võõrtaa TODO: does this need an attribute to control? |
Pl+Nom | võõr |
Pl+Gen | võõri |
Pl+Acc | võõrid |
Pl+Ill | võõrid |
Pl+Loc | võõrin |
Pl+Com | võõrivuiʹm |
Pl+Abe | võõritää |
Pl+Abe | võõritaa TODO: does this need an attribute to control? |
Ess | võrrân |
Par | võrrâd |
Proper nouns
For now, all proper nouns are not generated in Plural.
Sg+Nom Njuõttjokk
EX: Äʹnnjääuʹraž
TODO: determine how to display these in sms
Form | Context | Example | Translation |
---|---|---|---|
- | - | ||
Sg+Gen | X pääiʹǩ | ||
Sg+Ill | - | ||
Sg+Loc | - |
TODO: Any plural-only proper nouns?
Holidays?
use räjja in context for e.g. eeʹjjpeeiʹv räjja
Adjectives
For adjectives we use context as an attribute on the lemma node, in order to
TODO: determine some good contexts for adjs
Inflection | Context | Example |
---|---|---|
A+Pred+Sg | oođâs | |
A+Attr | context: "??" | ođđ (??) |
A+Comp | ođđsab | |
A+Superl | ođđsumus |
TODO: +A+Pred+Pl ?
numerals
TODO:
Pronouns
Personal pronouns
Most personal pronouns can be generated live from FSTs, depending on the
This also requires the type="Pers"attribute on the <l />node, and the
Inflection | Example |
---|---|
Sg+Nom | mon |
Sg+Gen | muu |
Sg+Acc | muu |
Sg+Ill | muʹnne |
Sg+Loc | muʹst |
Sg+Com | muin |
Sg+Abe | muutää |
Ess | muuʹnen |
Par | muuʹđed |
TODO:
Indef pron
måtam Måtmin
TODO:
Pregenerated paradigms
pronouns
Because the analyzer uses tags that make generation difficult, the thought was
TODO:
negative verb
TODO:
Sg1 Sg2 Sg3 Pl1 Pl2 Pl3
Homonymous entries
Homonymous entries (lemma + POS) may be tricky for a combination of the lexicon
TODO: jokk is homonymous in sms, find examples for documentation from there.
Non-systematic homonymy
TODO:
<l> element gets an attribute hid="1" or hid="2". The lemmas are marked
Nom | Gen | norsk | norm-fst-analyse |
---|---|---|---|
lohkki | lohki | lokk | lohkki+N+Sg+Nom |
lohkki | lohkki | lesar | lohkki+N+Actor+Sg+Nom |
TODO: xml examples of homonymous entries (either actor type, or hid type, etc.) 1. 2.