Skolt Saami2X
This page documents the Skolt Saami dictionary projects at Giellatekno.
The backbone of all dictionary projects, incl. the contlex files for the morphological analyzer, the Skolt Saami Oahpa and different user dictionaries are the lexical data stored in a database called sms2X.
User dictionaries we are are working on at present
Other applications linked to the lexical database
The sms2X lexical data backbone
The database is the result of collaborative work carried out at Østsamisk museum Neiden, Freiburg Research Group in Saami Studies, Giellatekno, and members of the Skolt Saami language communities.
Using XML with the NDS dictionary
Database
Underived parts-of-speech
-
a - adjective
-
adp - adposition
- adv - adverb
- cc - conjunction
- cs - subjunction
- det - determiner
- i - interjection
- n - noun
-
num - numeral
- pcle - particle
- pro - pronoun
- prop - proper noun
- v - verb
Derived parts-of-speech
Since most derivations are formed by means of regular/productive morphology and do not represent own lemmas they are stored in separate files for derived PoS's with the link to the respective root as a variable. For different kinds of dictionaries, we will later handle derivations differently:
- Oahpa!-nuõrti includes derivations similar to other lemmas (if these derivations are tagged for oahpa)
- saan.oahpa.no analyses derivations if they are written in the FST
- in a future printed dictionary some derivations will be listed under root lemmas
- contlex lexica do not include productive derivations
A PROBLEM: what are the productive (non/lexicalized) derivations and how do we tag them?
These are the files for derived parts-of-speech:
-
der_a - derived adjectives
- der_adv
- der_det
- der_n
-
der_num - derived numerals
- der_pro
- der_v
Other
-
abbr - abbreviations
- mwe - multiword expressions (listed as lemmas, e.g. for Oahpa!-nuõrti)
-
inf - inflected forms
- var - variants