Kildin Lexicon
Contents:
The directory pedversions/sjdoahpa/sjd/src contains the entries used in the current online version of sjdoahpa.
The russjd and engsjd files will come in pedversions/sjdoahpa/sjd/russjd and pedversions/sjdoahpa/sjd/engsjd.
Test
==> test the new Leksa online in both directions!
TODO
correct the restrictions in the translations
<entry>
<lemma>моаӆӆьчхэ</lemma>
<pos class="v"/>
<translations>
<tr xml:lang="rus">облезать о шкуре</tr>
<tr xml:lang="rus">облезать</tr>
<tr xml:lang="rus">облезть</tr>
<tr xml:lang="eng">to grow bare about a skin</tr>
<tr xml:lang="eng">to grow bare</tr>
</translations>
</entry>
this way
<entry>
<lemma>моаӆӆьчхэ</lemma>
<pos class="v"/>
<translations>
<tr xml:lang="rus" restr="о шкуре">облезать</tr>
<tr xml:lang="rus">облезть</tr>
<tr xml:lang="eng" restr="about a skin">to grow bare</tr>
</translations>
</entry>
TODO
v_sjdrus.xml: <tr xml:lang="eng">to sympathize</tr> v_sjdrus.xml: <tr xml:lang="eng">will hesitate for a time</tr> v_sjdrus.xml: <tr xml:lang="eng">to come nearer</tr> v_sjdrus.xml: <tr xml:lang="eng">to become hardened</tr> v_sjdrus.xml: <tr xml:lang="eng">to become callous</tr> v_sjdrus.xml: <tr xml:lang="eng">to harden</tr> v_sjdrus.xml: <tr xml:lang="eng">to become callous</tr> v_sjdrus.xml: <tr xml:lang="eng">to make someone sincere</tr> v_sjdrus.xml: <tr xml:lang="eng">to become sincere</tr> v_sjdrus.xml: <tr xml:lang="eng">to move</tr> v_sjdrus.xml: <tr xml:lang="eng">change a place</tr> v_sjdrus.xml: <tr xml:lang="eng">to pass to a new place</tr>
TODO
<entry id="кэ̄ннц">
<lemma>кэ̄ннц</lemma>
<pos class="n"/>
<translations>
<tr xml:lang="rus">ноготь</tr>
<tr xml:lang="eng"/>
</translations>
</entry>
<entry id="то̄лл">
<lemma>то̄лл</lemma>
<pos class="n"/>
<translations>
<tr xml:lang="rus">огонь</tr>
<tr xml:lang="rus">костер</tr>
<tr xml:lang="rus"> место для костра</tr>
<tr xml:lang="eng"/>
</translations>
</entry>
Total of
xml>grep -h '_ENG' *.xml | wc -l
49
Here is the list
xml>grep -n '_ENG' *.xml
xml>grep -n '_ENG' *.xml
n_sjdrus.xml:1579: <tr xml:lang="eng">ноготь_ENG</tr>
n_sjdrus.xml:1597: <tr xml:lang="eng">огонь_ENG</tr>
n_sjdrus.xml:2723: <tr xml:lang="eng">лосиха_ENG</tr>
n_sjdrus.xml:2933: <tr xml:lang="eng">важенка_ENG</tr>
n_sjdrus.xml:4608: <tr xml:lang="eng">pulp ляшки_ENG</tr>
n_sjdrus.xml:5525: <tr xml:lang="eng">оленята_ENG</tr>
n_sjdrus.xml:5586: <tr xml:lang="eng">олененок_ENG</tr>
n_sjdrus.xml:5724: <tr xml:lang="eng">морозец_ENG</tr>
n_sjdrus.xml:5739: <tr xml:lang="eng">морозец_ENG</tr>
n_sjdrus.xml:5754: <tr xml:lang="eng">морозец_ENG</tr>
n_sjdrus.xml:5812: <tr xml:lang="eng">сиг_ENG</tr>
n_sjdrus.xml:5826: <tr xml:lang="eng">кумжа_ENG</tr>
n_sjdrus.xml:6020: <tr xml:lang="eng">пинагор_ENG</tr>
n_sjdrus.xml:6049: <tr xml:lang="eng">каменки_ENG</tr>
n_sjdrus.xml:6050: <tr xml:lang="eng">мальки_ENG</tr>
n_sjdrus.xml:6065: <tr xml:lang="eng">сиг big_ENG</tr>
n_sjdrus.xml:6066: <tr xml:lang="eng">big сиг_ENG</tr>
n_sjdrus.xml:6122: <tr xml:lang="eng">хариус_ENG</tr>
n_sjdrus.xml:6136: <tr xml:lang="eng">хариус_ENG</tr>
n_sjdrus.xml:6377: <tr xml:lang="eng">smell варенной fishes_ENG</tr>
n_sjdrus.xml:6447: <tr xml:lang="eng">бражка_ENG</tr>
n_sjdrus.xml:6886: <tr xml:lang="eng">лопанье_ENG</tr>
n_sjdrus.xml:6914: <tr xml:lang="eng">лопанье_ENG</tr>
n_sjdrus.xml:7243: <tr xml:lang="eng">шамшура_ENG</tr>
n_sjdrus.xml:7300: <tr xml:lang="eng">zone part on female ярах_ENG</tr>
n_sjdrus.xml:7332: <tr xml:lang="eng">skin дублённая_ENG</tr>
n_sjdrus.xml:7361: <tr xml:lang="eng">позументная tape_ENG</tr>
n_sjdrus.xml:7390: <tr xml:lang="eng">valve on man's ярах_ENG</tr>
n_sjdrus.xml:9687: <tr xml:lang="eng">вежа_ENG</tr>
n_sjdrus.xml:10103: <tr xml:lang="eng">pure place in куваксе_ENG</tr>
n_sjdrus.xml:11309: <tr xml:lang="eng">круча mountains_ENG</tr>
n_sjdrus.xml:11617: <tr xml:lang="eng">озерко_ENG</tr>
n_sjdrus.xml:11737: <tr xml:lang="eng">корга_ENG</tr>
n_sjdrus.xml:12211: <tr xml:lang="eng">thickets ивника_ENG</tr>
n_sjdrus.xml:15740: <tr xml:lang="eng">сонорный a sound_ENG</tr>
n_sjdrus.xml:15754: <tr xml:lang="eng">deaf сонорный a sound_ENG</tr>
n_sjdrus.xml:15782: <tr xml:lang="eng">deaf сонорный a short nasal sound_ENG</tr>
n_sjdrus.xml:15796: <tr xml:lang="eng">deaf сонорный a long nasal sound_ENG</tr>
n_sjdrus.xml:15810: <tr xml:lang="eng">deaf сонорный языковый a short sound_ENG</tr>
n_sjdrus.xml:15824: <tr xml:lang="eng">deaf сонорный языковый a long sound_ENG</tr>
n_sjdrus.xml:16663: <tr xml:lang="eng">cuffs малицы_ENG</tr>
n_sjdrus.xml:16729: <tr xml:lang="eng">малица_ENG</tr>
pron_sjdrus.xml:124: <tr xml:lang="eng">which-nibud_ENG</tr>
v_sjdrus.xml:1525: <tr xml:lang="eng">тошнить_ENG</tr>
v_sjdrus.xml:1709: <tr xml:lang="eng">тошнить_ENG</tr>
v_sjdrus.xml:4584: <tr xml:lang="eng">small шинковать_ENG</tr>
v_sjdrus.xml:4805: <tr xml:lang="eng">tax to give_ENG</tr>
v_sjdrus.xml:4822: <tr xml:lang="eng">is subject to the supreme court_ENG</tr>
v_sjdrus.xml:4984: <tr xml:lang="eng">will hesitate for a time_ENG</tr>
possible future todo
@Micha: a few observations:
- ё vs. е in Russian (e.g. вдвоём / вдвоем); perhaps we should consistently use ё in the xml, but include е (with spellrelax) for oahpa users?
- the semantics should be checked (does the other oahpas use predefined sets of values?), e.g. why is <lemma>э̄ххт тоа̄фант</lemma> <tr xml: lang="eng">one thousand</tr> "HUMAN", or why is <lemma>кутӭ-кутӭ</lemma> <tr xml: lang="eng">two each</tr> "HUMAN" and "FOOD"? It could be anything: "cars", "reindeer", "xml databases", etc.
- common (uni)coding issues (perhaps we can apply a script to future incoming data):
- Latin letters in Cyrillic: a --> а, o --> о, etc. (even in Russian text)
- Precomposed vs. combining diaeresis: ё --> ё, ӓ --> ӓ, ӭ --> ӭ
- Precomposed vs. combining macron: ӣ --> ӣ, ӯ --> ӯ
- Latin letters in Cyrillic: a --> а, o --> о, etc. (even in Russian text)
- several multi word lemmata, like <lemma>э̄ххт чӯдтҍ</lemma> or <lemma>югкеналла лыдцант</lemma> or <lemma>пя̄лла ӣнсэй оанҍхэсь нюннҍ тӣххт</lemma> (especially the latter two are definitely not lemmas, but paraphrases)
- there are even entries with multiword expressions both as lemmas and translations, like:
<entry>
<lemma>ко̄ппче соа̄йметҍ</lemma>
<pos class="v"/>
<translations>
<tr xml:lang="rus">собирать сетки</tr>
<tr xml:lang="eng">to collect grids</tr>
</translations>
<semantics>
<sem class="PLACE-WATER"/>
<sem class="VERB"/>
</semantics>
<sources>
<book name="Saamkilsyjjt"/>
</sources>
</entry>
Can we use these for a vocabulary trainer?
- English verbs with(out) "to"? (e.g. <tr xml: lang="eng">undress</tr> vs. <tr xml: lang="eng">to dress</tr>)
- free word order in Russian NP? (e.g. <tr xml: lang="rus">хвост короткий</tr> and <tr xml: lang="rus">короткий хвост</tr>)
- attr. vs. pred. adjectives (in sjd and rus!)
- translations needs to be checked carefully: cf. this example: the basic meaning of this Kildin word clearly means "unfamiliar, unknown", of course in some situations this can also be expressed as "new", but as a translation of the lemma >eehk< "new" is clearly wrong, especially in a vocabulary trainer (in a true dict we could give this "new-meaning" in an example sentence)
<entry>
<lemma>е̄ххк</lemma>
<pos class="a"/>
<translations>
<tr xml:lang="rus">незнакомый</tr>
<tr xml:lang="rus">новый</tr>
<tr xml:lang="eng">unfamiliar</tr>
<tr xml:lang="eng">new</tr>
</translations>
<semantics>
<sem class="HUMAN"/>
<sem class="CLOTHES"/>
</semantics>
<sources>
<book name="Saamkilsyjjt"/>
</sources>
</entry>
- inflected form as lemma
<entry>
<lemma>углясьт</lemma>
<pos class="adv"/>
<translations>
<tr xml:lang="rus">в уголке</tr>
<tr xml:lang="eng">in a corner</tr>
</translations>
<semantics>
<sem class="LIVING-PLACE"/>
</semantics>
<sources>
<book name="Saamkilsyjjt"/>
</sources>
</entry>
pos="adv" is wrong, because this is an inflected noun (which can of course be used as an adverbial); I understand that such forms should be used for training vocabulary, but we have to find another tag for the pos value here
- this is not a "dim_set", but a "pl_set"!
<entry>
<lemma>па̄лл</lemma>
<pos class="n"/> <!--noun in sg-->
<translations>
<tr xml:lang="rus">мяч</tr>
<tr xml:lang="eng">ball</tr>
</translations>
<semantics>
<sem class="LIVING_PLACE"/>
<sem class="DIM_SET"/>
</semantics>
<sources>
<book name="Saamkilsyjjt"/>
</sources>
</entry>
<entry>
<lemma>па̄л</lemma>
<pos class="n"/> <!-- n in pl-->
<translations>
<tr xml:lang="rus">мячи</tr>
<tr xml:lang="eng">balls</tr>
</translations>
<semantics>
<sem class="LIVING_PLACE"/>
<sem class="DIM_SET"/>
</semantics>
<sources>
<book name="Saamkilsyjjt"/>
</sources>
</entry>
we could of course use the oahpa for the training of inflectional forms, but is it useful to have plural forms mixed up with diminutives?

