Notes For New Oahpa Code
This is a document for ideas and discussion about the new Oahpa code.
workshop-week from 19. Sept
IDEAS
Requirements, from different perspectives:
- programmer
- linguist
- teacher
- students
Today:Special features, not used for all languages
Leksa:
- audio-files (crk-oahpa, vro-oahpa)
- for the time being: audio files are not functioning with Safari
- for the time being: audio files are not functioning with Safari
<l pos="N" animacy="AN" t2c="no" rime="0" gen_only="Pl,Sg" audio="kohkos">kohkôs</l>
- "almost correct"-feedback to the student (tcomm) for sma-oahpa, sme-oahpa
<t pos="n" stat="pref">mors eldre søster</t> <t pos="n">eldre søster til mor</t> <t pos="n" tcomm="yes">tante</t>
- different colours in teh interface for different animacy (crk-oahpa)
- crk-oahpa: green for N+AN, or animacy="AN"
- crk-oahpa: green for N+AN, or animacy="AN"
- Place names as an option (sme-oahpa, sma-oahpa)
- Multiword expressions as an option in menu (sme-oahpa, sma-oahpa)
- the target language is multiword expression
- the target language is multiword expression
Morfa
- Dictionary (NDS) is integrated (crk-oahpa)
- only-sg choice in menu (fkv-oahpa, rus-oahpa)
- Transitive verbs (TI) are presented with object in Morfa-S (crk-oahpa)
<l pos="V" object="iskocês" trans_anim="TI" initial="c" rime="m">wâpahtam</l>
- What kind of agreement or other rule inducement is used or is needed
- rus-oahpa: agreement of subject and main verb in number for pres/past, in gender for past tense (in elaborated multiple sentence tasks may be actual, not grammatical gender for professions)
- rus-oahpa: agreement of modifier (adj or adj-l pron) and noun in gender and case in singular, MFN as gender (for modifier) and case in plural. Implies also change of gender for Adj Sg Nom -> Pl Nom task: {Msc, Fem, Neu} -> MFN
- rus-oahpa: verb passive forms may be congruent with reflexive verb forms, whereas not all reflexive verbs from dictionary are suitable to be used for questions-answers in active voice (passive is not curently used in tasks)
- rus-oahpa: agreement of subject and main verb in number for pres/past, in gender for past tense (in elaborated multiple sentence tasks may be actual, not grammatical gender for professions)
FST with error-tags (Err)
- only for sme, in old infra:
lookup -flags mbTT -utf8 /Users/lan000/errortag-gt/gt/sme/bin/ped-sme.fst
and is used for Vasta and Sahka.
e.g.
viessui viessu+Build+N+DiphErr+Sg+Ill vissui viessu+Build+N+Sg+Ill lavka lávka+N+Sg+Nom+AErr lávka lávka+N+Sg+Nom lávka lávka+N+CGErr+Sg+Gen lávkka lávka+N+Sg+Gen
We want this for more languages, but implemented in another way than for sme. By concatenating FSTs instead of messing up the source files, which is a big work to maintain.
That we have done for numeral transcriptor in crk (Err/ShLo means error marking short/long vowel)
peyak 1+Err/ShLo pêyak 1
Numra
- Numra: special feedback for tag +Err/ShLo (crk-oahpa)
- crk-oahpa: you can test with peyak for 1
- rus-oahpa: tag +Err/ShLo is added if one or more stress marks are missing in a numeral string, accepted as a correct answer. The student's answer with incorrect stress marks is not accepted (although almost correct).
- crk-oahpa: you can test with peyak for 1
Different ways of adding limitations for words or forms
Leksa
- exclude="smenob" in e-element (smeOahpa, smaOahpa)
- for excluding nobsme: remove the nobsme entry
- for excluding nobsme: remove the nobsme entry
- <sem class="NOTLEKSA"/> (crkOahpa)
Morfa completily
- morfas="no" in l-element (smeOahpa)
- gen_only="None" (crkOahpa)
Morfa partly
- to be included in some Morfa tasks: gen_only="Pl,Sg" (crkOahpa)
- to be included in one Morfa task: <sem class="MORFAPOSS"/> (crkOahpa)
- to be included in Morfa tasks: <sem class="MORFA"/> (fkvOahpa)
- to be included in Morfa tasks: <sem class="MORFAS"/> (smeOahpa, smaOahpa)
Interface design
- all relevant menus in the top, also PoS choice (crk-oahpa)
- instructions is given over the tasks, instead of in a box to the right (crk-oahpa)
- morphological feedback appears to the right of the question it is referring to (crk-oahpa)
- tool tip in instructions, eg. example of how to answer (crk-oahpa)
- tool tip in morphological feedback (crk-oahpa)
- special characters are given in a typer for all programs (crk-oahpa)
- it appears when you click in the blank
- it appears when you click in the blank
- menu for grammar explanations, with links to a web grammar (sme-oahpa)
- dialect choice in menu (sme-oahpa and sma-oahpa)
- only one dialect is presented in the tasks, but both dialects are accepted as input from the student
- the morphological feedback is according to the chosen dialect
- only one dialect is presented in the tasks, but both dialects are accepted as input from the student
- all instruction is in target language, but the student gets a tool tip translation by clicking Alt (sme-oahpa)
- short verbal explanation of the game topic under each game icon and name (vro-oahpa)
New ideas (don't hesitate)
General structure ideas
Will need some discussion:
Oahpa needs more separation within the source code, there is a lot of logic
One way to simplify this could be separating Oahpa into two major pieces: (1)
These two pieces could either exist as their own separate codebases, or be in
Each language project should have a central configuration setup for linguistic
Other:
- Newest version of Django
- Use django-rest-API for exercise generation pieces?
- Word database structure is otherwise great, tagsets, semantics, etc., but it
Exercises
- Defining Morfa-S and Leksa exercises should be as simple as editing YAML;
- Exercise code needs to produce very simple output, so that we can write
- Morfa-C: we can write some generalized agreement process that allows a
Exercise Definitions
For exercise definitions there are three parts:
1.) defining how menus appear, and what content goes in them
For leksa, numra, and morfa-s, the general structure is shared: question is a word with a particular form, answer is a question with a particular form.
Question form:
- internal question ID, for making menus, etc.?
- language iso
- word semantics
- word morphology
- context
Answer form:
- language iso
- word semantics (usually same)
- word morphology (leksa: usually lemma, morfa-s: cases, tense, etc)
- context
Some of the ideas here exist in NDS (context-dependent question context)
Database
As far as django is concerned, there should be a separate lexicon/morphology
It should be possible to install wordforms as a separate part of the process:
- Example: the word structure, semantics, and Morfa-C exercises all do not
- Example: wordforms are fine, but Morfa-S or Leksa question/answer pairs have changed
- Example: wordforms are fine, but Morfa-C has a new question, or a changed question
- Example: wordforms are fine, questions are fine, but semantic sets need to be changed
- Example: maybe more ideas ...
An option is using the FSTs "live", and only caching the last forms generated,
We can rely on an updated lookup server daemon, which already exists, to speed
Install definition
The install process should be defined in some system other than a `bash file,
User feedback
User activity logging
Would be very nice to have ideas
... but not immediately crucial
- Morfa-C could use a word / wordform / semantics browser interface to simplify
What to send to analyser
Today
- Numra is sent to transkriptor-FST, with error-tags,
- Vasta and Sahka is sent to FST with error-tags
New code
- Send Morfa to FST (API) with error-tags
- Send also Leksa to FST with error-tags?
Modularity
Linguistic data
Different kinds of data:
- information in the source files
- definitions of how to display content based on morphology, contained within python (ex. if noun, add verb pronoun context, if Essive, no Sg or Pl.)
- coordinated with the information in the systemfiles
- for choices in menus
- for producing the tasks
- for choices in menus
Contents of drop-down menus (semantic supersets, case a.o. grammatical category lists, source lists (books and chapters), etc.) should be in separate file(s), not inside the program code files.
- Create a procedure for automatically generating these lists out of the Oahpa source files (lexicon, tags, paradigms, semantic sets).
- In the teacher interface there could be an option to check the contents of the automatically generated menus and to suppress the unwanted items.
Small FST for Oahpa? Containing the lexicon from ped/LANG/src/
System files
Audio files/Text-to-speech
How to make Oahpa for new languages
Maintainence
- Linguist interface. The linguist should be able to install/remove new lemmas and tasks.
- Teacher interface, e.g. (taken from oahpapres.pdf):
- Mark (in the list) the part of speech and the
- Mark (in the list) the lemmas you want to include
- Write the instruction for the students
- – Wait, while the program is compiling –
- Test the tasks. Click for the next step.
Consistency in naming
- Aim to introduce/keep names the same for the same things (and better different for substantial differences)
- Tag collisions
- Different uses of +Pass tag in sme and rus-oahpa (affects also nob, smn, sms, fin, est, vro, fkv, fao, mhr, vep)
- rus-oahpa: some verb forms have secondary tags that are confused even with PoS tag (Adv)
- Different uses of +Pass tag in sme and rus-oahpa (affects also nob, smn, sms, fin, est, vro, fkv, fao, mhr, vep)
- Database fields
- Database tables have columns named with actually reserved words: string, case. This could be avoided.
Konteaksta
Student-log-in
Experiences from working with Oahpa from different languages
Pain points
- database update takes time, sometimes changes are small (i.e., subset of
- database update requires a lot of awareness of an individual oahpa installation
- database: lots of places to configure data being installed, adding files is
- exercises: Morfa-C: adding new types of agreement is tricky
- exercises: Morfa-S: new types of exercises are also tricky
- exercises: Numra relies on some external FSTs, which Morfa currently does
- exercise: sahka / vasta lookupserv is tough to configure (new version exists:
- general: errors are opaque, the backend could do a better job of explaining what is wrong
Document the differences between the versions
The full list (24 Oahpas), is, as always, http://giellatekno.uit.no/ped/index.html.
Most differences follow from the availability of basis resources.
- Only sme has a full CG program, so only sme has Sahka
- We are now working on Vasta-F for translation (with CG) for crk and est
- Languages without CG but with an FST typically have 4 programs
- Languages also an FST typically have 2 programs, Numra and Leksa
sme
-
http://oahpa.no/davvi/
- 6 programs: Morfa-S, Morfa-C, Leksa, Numra, Vasta-S, Vasta-F, Sahka
sma
-
http://oahpa.no/aarjel/
- 4 programs: Morfa-R (= Morfa-C), Morfa-B (= Morfa-S), Leksa, Numra
est
- oahpa: http://testing.oahpa.no/eesti/
- docu: http://giellatekno.uit.no/ped/est-oahpa.html
- 5 programs
- Morfa-C noun: "limited mix" exercises:
vro
-
http://testing.oahpa.no/voro/
- 4 programs
- spell-relax issues because of different ways of marking palatalisation and glottal stop
- some audio files in Leksa (the full lexicon coming soon)
- TTS integration under development
- Morfa-C noun: "limited mix" exercises
crk
-
http://giellatekno.uit.no/ped/crkdoc/oahpa/crk-oahpa.html
- 4 programs
- crk has some audio files.
liv
- http: //testing.oahpa.no/livokel/]
- http: //giellatekno.uit.no/ped/liv-oahpa.html]
- 4 working programs
- The code for this setup is the same as for the other
- This is the language with the most complicated writing system, and it
rus
- http: //giellatekno.uit.no/ped/rus-oahpa.html]
- http: //testing.oahpa.no/rusoahpa/]
- 4 programs
- Russian has a mechanism for handling stress
- rusoahpa should be moved from gtlab to gtoahpa
sms
-
http://oahpa.no/nuorti/
-
https://giellalt.uit.no/lang/sms/j-sms.html
- 4 programs
- Morfa-S nouns: diminutives, possessives and their combinations
The other Oahpas (probably not so relevant for the initial part of the work)
izh
mhr
-
http://testing.oahpa.no/mhr_oahpa/
- 4 programs
mrj
myv
fkv
mdf
bxr
-
http://testing.oahpa.no/bxr_oahpa/
- 3 programs working: Leksa, Numra, Morfa-S
olo
yrk
FST exists but it not really implemented
hdn
The FST-less Oahpas
rup
kpv
sjd
udm
smn
-
http://oahpa.no/aanaar/
- Two programs: Leksa, Numra