Kildin Lexicon
Contents:
The directory pedversions/sjdoahpa/sjd/src contains the entries used in the current online version of sjdoahpa.
The russjd and engsjd files will come in pedversions/sjdoahpa/sjd/russjd and pedversions/sjdoahpa/sjd/engsjd.
Test
==> test the new Leksa online in both directions!
TODO
correct the restrictions in the translations
<entry> <lemma>моаӆӆьчхэ</lemma> <pos class="v"/> <translations> <tr xml:lang="rus">облезать о шкуре</tr> <tr xml:lang="rus">облезать</tr> <tr xml:lang="rus">облезть</tr> <tr xml:lang="eng">to grow bare about a skin</tr> <tr xml:lang="eng">to grow bare</tr> </translations> </entry>
this way
<entry> <lemma>моаӆӆьчхэ</lemma> <pos class="v"/> <translations> <tr xml:lang="rus" restr="о шкуре">облезать</tr> <tr xml:lang="rus">облезть</tr> <tr xml:lang="eng" restr="about a skin">to grow bare</tr> </translations> </entry>
TODO
v_sjdrus.xml: <tr xml:lang="eng">to sympathize</tr> v_sjdrus.xml: <tr xml:lang="eng">will hesitate for a time</tr> v_sjdrus.xml: <tr xml:lang="eng">to come nearer</tr> v_sjdrus.xml: <tr xml:lang="eng">to become hardened</tr> v_sjdrus.xml: <tr xml:lang="eng">to become callous</tr> v_sjdrus.xml: <tr xml:lang="eng">to harden</tr> v_sjdrus.xml: <tr xml:lang="eng">to become callous</tr> v_sjdrus.xml: <tr xml:lang="eng">to make someone sincere</tr> v_sjdrus.xml: <tr xml:lang="eng">to become sincere</tr> v_sjdrus.xml: <tr xml:lang="eng">to move</tr> v_sjdrus.xml: <tr xml:lang="eng">change a place</tr> v_sjdrus.xml: <tr xml:lang="eng">to pass to a new place</tr>
TODO
<entry id="кэ̄ннц"> <lemma>кэ̄ннц</lemma> <pos class="n"/> <translations> <tr xml:lang="rus">ноготь</tr> <tr xml:lang="eng"/> </translations> </entry> <entry id="то̄лл"> <lemma>то̄лл</lemma> <pos class="n"/> <translations> <tr xml:lang="rus">огонь</tr> <tr xml:lang="rus">костер</tr> <tr xml:lang="rus"> место для костра</tr> <tr xml:lang="eng"/> </translations> </entry> Total of xml>grep -h '_ENG' *.xml | wc -l 49 Here is the list xml>grep -n '_ENG' *.xml xml>grep -n '_ENG' *.xml n_sjdrus.xml:1579: <tr xml:lang="eng">ноготь_ENG</tr> n_sjdrus.xml:1597: <tr xml:lang="eng">огонь_ENG</tr> n_sjdrus.xml:2723: <tr xml:lang="eng">лосиха_ENG</tr> n_sjdrus.xml:2933: <tr xml:lang="eng">важенка_ENG</tr> n_sjdrus.xml:4608: <tr xml:lang="eng">pulp ляшки_ENG</tr> n_sjdrus.xml:5525: <tr xml:lang="eng">оленята_ENG</tr> n_sjdrus.xml:5586: <tr xml:lang="eng">олененок_ENG</tr> n_sjdrus.xml:5724: <tr xml:lang="eng">морозец_ENG</tr> n_sjdrus.xml:5739: <tr xml:lang="eng">морозец_ENG</tr> n_sjdrus.xml:5754: <tr xml:lang="eng">морозец_ENG</tr> n_sjdrus.xml:5812: <tr xml:lang="eng">сиг_ENG</tr> n_sjdrus.xml:5826: <tr xml:lang="eng">кумжа_ENG</tr> n_sjdrus.xml:6020: <tr xml:lang="eng">пинагор_ENG</tr> n_sjdrus.xml:6049: <tr xml:lang="eng">каменки_ENG</tr> n_sjdrus.xml:6050: <tr xml:lang="eng">мальки_ENG</tr> n_sjdrus.xml:6065: <tr xml:lang="eng">сиг big_ENG</tr> n_sjdrus.xml:6066: <tr xml:lang="eng">big сиг_ENG</tr> n_sjdrus.xml:6122: <tr xml:lang="eng">хариус_ENG</tr> n_sjdrus.xml:6136: <tr xml:lang="eng">хариус_ENG</tr> n_sjdrus.xml:6377: <tr xml:lang="eng">smell варенной fishes_ENG</tr> n_sjdrus.xml:6447: <tr xml:lang="eng">бражка_ENG</tr> n_sjdrus.xml:6886: <tr xml:lang="eng">лопанье_ENG</tr> n_sjdrus.xml:6914: <tr xml:lang="eng">лопанье_ENG</tr> n_sjdrus.xml:7243: <tr xml:lang="eng">шамшура_ENG</tr> n_sjdrus.xml:7300: <tr xml:lang="eng">zone part on female ярах_ENG</tr> n_sjdrus.xml:7332: <tr xml:lang="eng">skin дублённая_ENG</tr> n_sjdrus.xml:7361: <tr xml:lang="eng">позументная tape_ENG</tr> n_sjdrus.xml:7390: <tr xml:lang="eng">valve on man's ярах_ENG</tr> n_sjdrus.xml:9687: <tr xml:lang="eng">вежа_ENG</tr> n_sjdrus.xml:10103: <tr xml:lang="eng">pure place in куваксе_ENG</tr> n_sjdrus.xml:11309: <tr xml:lang="eng">круча mountains_ENG</tr> n_sjdrus.xml:11617: <tr xml:lang="eng">озерко_ENG</tr> n_sjdrus.xml:11737: <tr xml:lang="eng">корга_ENG</tr> n_sjdrus.xml:12211: <tr xml:lang="eng">thickets ивника_ENG</tr> n_sjdrus.xml:15740: <tr xml:lang="eng">сонорный a sound_ENG</tr> n_sjdrus.xml:15754: <tr xml:lang="eng">deaf сонорный a sound_ENG</tr> n_sjdrus.xml:15782: <tr xml:lang="eng">deaf сонорный a short nasal sound_ENG</tr> n_sjdrus.xml:15796: <tr xml:lang="eng">deaf сонорный a long nasal sound_ENG</tr> n_sjdrus.xml:15810: <tr xml:lang="eng">deaf сонорный языковый a short sound_ENG</tr> n_sjdrus.xml:15824: <tr xml:lang="eng">deaf сонорный языковый a long sound_ENG</tr> n_sjdrus.xml:16663: <tr xml:lang="eng">cuffs малицы_ENG</tr> n_sjdrus.xml:16729: <tr xml:lang="eng">малица_ENG</tr> pron_sjdrus.xml:124: <tr xml:lang="eng">which-nibud_ENG</tr> v_sjdrus.xml:1525: <tr xml:lang="eng">тошнить_ENG</tr> v_sjdrus.xml:1709: <tr xml:lang="eng">тошнить_ENG</tr> v_sjdrus.xml:4584: <tr xml:lang="eng">small шинковать_ENG</tr> v_sjdrus.xml:4805: <tr xml:lang="eng">tax to give_ENG</tr> v_sjdrus.xml:4822: <tr xml:lang="eng">is subject to the supreme court_ENG</tr> v_sjdrus.xml:4984: <tr xml:lang="eng">will hesitate for a time_ENG</tr>
possible future todo
@Micha: a few observations:
- ё vs. е in Russian (e.g. вдвоём / вдвоем); perhaps we should consistently use ё in the xml, but include е (with spellrelax) for oahpa users?
- the semantics should be checked (does the other oahpas use predefined sets of values?), e.g. why is <lemma>э̄ххт тоа̄фант</lemma> <tr xml: lang="eng">one thousand</tr> "HUMAN", or why is <lemma>кутӭ-кутӭ</lemma> <tr xml: lang="eng">two each</tr> "HUMAN" and "FOOD"? It could be anything: "cars", "reindeer", "xml databases", etc.
- common (uni)coding issues (perhaps we can apply a script to future incoming data):
- Latin letters in Cyrillic: a --> а, o --> о, etc. (even in Russian text)
- Precomposed vs. combining diaeresis: ё --> ё, ӓ --> ӓ, ӭ --> ӭ
- Precomposed vs. combining macron: ӣ --> ӣ, ӯ --> ӯ
- Latin letters in Cyrillic: a --> а, o --> о, etc. (even in Russian text)
- several multi word lemmata, like <lemma>э̄ххт чӯдтҍ</lemma> or <lemma>югкеналла лыдцант</lemma> or <lemma>пя̄лла ӣнсэй оанҍхэсь нюннҍ тӣххт</lemma> (especially the latter two are definitely not lemmas, but paraphrases)
- there are even entries with multiword expressions both as lemmas and translations, like:
<entry> <lemma>ко̄ппче соа̄йметҍ</lemma> <pos class="v"/> <translations> <tr xml:lang="rus">собирать сетки</tr> <tr xml:lang="eng">to collect grids</tr> </translations> <semantics> <sem class="PLACE-WATER"/> <sem class="VERB"/> </semantics> <sources> <book name="Saamkilsyjjt"/> </sources> </entry>
Can we use these for a vocabulary trainer?
- English verbs with(out) "to"? (e.g. <tr xml: lang="eng">undress</tr> vs. <tr xml: lang="eng">to dress</tr>)
- free word order in Russian NP? (e.g. <tr xml: lang="rus">хвост короткий</tr> and <tr xml: lang="rus">короткий хвост</tr>)
- attr. vs. pred. adjectives (in sjd and rus!)
- translations needs to be checked carefully: cf. this example: the basic meaning of this Kildin word clearly means "unfamiliar, unknown", of course in some situations this can also be expressed as "new", but as a translation of the lemma >eehk< "new" is clearly wrong, especially in a vocabulary trainer (in a true dict we could give this "new-meaning" in an example sentence)
<entry> <lemma>е̄ххк</lemma> <pos class="a"/> <translations> <tr xml:lang="rus">незнакомый</tr> <tr xml:lang="rus">новый</tr> <tr xml:lang="eng">unfamiliar</tr> <tr xml:lang="eng">new</tr> </translations> <semantics> <sem class="HUMAN"/> <sem class="CLOTHES"/> </semantics> <sources> <book name="Saamkilsyjjt"/> </sources> </entry>
- inflected form as lemma
<entry> <lemma>углясьт</lemma> <pos class="adv"/> <translations> <tr xml:lang="rus">в уголке</tr> <tr xml:lang="eng">in a corner</tr> </translations> <semantics> <sem class="LIVING-PLACE"/> </semantics> <sources> <book name="Saamkilsyjjt"/> </sources> </entry>
pos="adv" is wrong, because this is an inflected noun (which can of course be used as an adverbial); I understand that such forms should be used for training vocabulary, but we have to find another tag for the pos value here
- this is not a "dim_set", but a "pl_set"!
<entry> <lemma>па̄лл</lemma> <pos class="n"/> <!--noun in sg--> <translations> <tr xml:lang="rus">мяч</tr> <tr xml:lang="eng">ball</tr> </translations> <semantics> <sem class="LIVING_PLACE"/> <sem class="DIM_SET"/> </semantics> <sources> <book name="Saamkilsyjjt"/> </sources> </entry> <entry> <lemma>па̄л</lemma> <pos class="n"/> <!-- n in pl--> <translations> <tr xml:lang="rus">мячи</tr> <tr xml:lang="eng">balls</tr> </translations> <semantics> <sem class="LIVING_PLACE"/> <sem class="DIM_SET"/> </semantics> <sources> <book name="Saamkilsyjjt"/> </sources> </entry>
we could of course use the oahpa for the training of inflectional forms, but is it useful to have plural forms mixed up with diminutives?