vocabularyoutline
Vocabulary games
We would like to make a set of vocabulary games. The idea is to divide the vocabulary according to part of speech, adjective, verb, noun, adverb.
Outline
The vocabulary lists may be found as follows:
- Find the 1000 most common Norwegian nouns and verbs, and the 500 most common adjectives and adverbs.
- Translate them by using the ismenob.fst tool (words/dicts/smenob/ismenob.fst).
- Problem: The ismenob.fst is not that good, it is just the inverse of the smenob.fst. A better tool will be achieved by converting the underlying xml document from smenob.xml to ismenob.xml, and then make a nobsme.fst from that.
- Eventually we would like to take a Sámi frequency list as starting point, and translate via smenob.fst.
- Problem: The ismenob.fst is not that good, it is just the inverse of the smenob.fst. A better tool will be achieved by converting the underlying xml document from smenob.xml to ismenob.xml, and then make a nobsme.fst from that.
- check whether important Sámi words are missing, and add them
- Divide the word pairs into three groups according to frequency.
- In order to be able to reverse the game, we actually need two lists, one sme-nob and one nob-sme, in order to limit variation to one-many, and not many-many.
The target program should have the following characteristics:
- The user chooses POS, level and direction (nob-sme or sme-nob)
- The machine draws an arbitrary set of 10 words
- The machine presents the source words, one by one
- For each word, the user provides a translation
- The machine reacts in the usual way (green = ok, red = wrong, or similarily)
- After a completed set of 10, the machine gives statistics and some encouraing words
In addition to the frequency-based games we should have games ordered after semantic class: Bodyparts, food, drinks, professions, etc.
Proper name drill
Now, we have an overview of parallel names in different municipalities
- A list showing the number of parallel names per municipality is found here.
- The names are found in the words/terms/ folder
Discussion
- What do you think?
- We need a spec for the format of the lists. While waiting, smeTABnobCR should be ok.
Lene: Good idea, but we have to remember synonyms and sideforms. We have to make the wordlist so the machine will find the synonyms, not only one-to-one.
nob sme hus - viessu, dállu, stohpu
It could also be pedagogically good to make groups of words, instead of only the 1000 most common words. Then we also could use them as a part of dialogues. If you are in a shop, then you could also have a part with vocabulary game with things which probably are in the shop (e.g. grocers).