Numra

Contents:

Intro
Debugging lexc inconsistency
Class inheritance structure
Documentation of the content production
Documentation on the interplay between fsts and the django code
The different types of Numra
Documenting +Use/NG and +Use/NA

This page contains documentation for programmers. See also the Numra for developers page, which contains documentations for linguists making the automata for the actual languages.

The files are, as always, in ped/LANG_oahpa/, except for North Sámi where the the prefix univ is used instead of sme.

Intro

Numra differs from Morfa in not using the MySQL database, but using transducers directly.

The transducers behind numra are stored in $GTHOME/langs, and are the following (where $LANG is the iso code for your language):

Cardinal and ordinal numbers: $LANG/src/transcriptions/transcriptor-numbers-digit2text.lexc
Clock: $LANG/src/transcriptions/transcriptor-clock-digit2text.lexc
Date: $LANG/src/transcriptions/transcriptor-date-digit2text.lexc

Note that the order of writing in the lexc file should be as indicated in the filenames, thus one should write 2:two ( digit2text) rather than two:2, as it is (wrongly) written for some of the languages.

The files governing the four programs are num.html, num_ord.html, clock.html, date.html. These files are found here:

for sme: ped/univ_oahpa/univ_drill/templates/,
for sma: ped/smaoahpa/smadrill/templates/
for all other languages: ped/LANG_oahpa/LANG_drill/templates/, where
- LANG = the iso code of the language in question

These files refer to the fsts on gtoahpa and gtlab, in the /opt/smi/ catalogue there. Oahpa versions with an url testing.oahpa.no are on gtlab, versions with url oahpa.no are on gtoahpa.

Debugging lexc inconsistency

Unfortunately, the Numra transducers were from the beginning written in two different directions, as referred to above. Instead of correcting this, this inconcistency was projected over to new languages, and into the Oahpa code. During the Oahpa newinfra project this should be fixed, in the following way:

in $GTHOME/langs/LANG/src/transcriptors, the files should get intended filenames DONE
The lexc content should be harmonised to the 2:two pattern PARTLY DONE
The same filename should be used in /opt/LANG/ PARTLY DONE
The same filename (and 2:two pattern) should be used in the Oahpa code PARTLY DONE

The command (in ped/)

grep -A1 'class NumGame' *_oahpa/*_drill/game.py

reveals that file references for appr. half of the Numras have been updated. As a quick fix for several languages, the content of the correct filenames have been copied over to the old ones in /opt/smi/LANG/bin on gtoahpa (TT).

To sum up: Such file names are in use in /opt/smi/ and the Numra code:

WRONG: smn-num.fst
WRONG: transcriptor-numbers2text-desc.xfst
Correct: transcriptor-numbers-digit2text.filtered.lookup.xfst

Class inheritance structure

The class inheritance structure for the three Numra modes Numra, Klokka and Dato is the same in both the classes in LANG_drill/game.py and LANG_drill/forms.py, with the classes getting progressively simpler, where Dato is the simplest. (Note: Exchange "LANG" with the relevant language code, note that for sma it is smadrill and for sme it is univ_drill)

Overall, methods and classes in LANG_drill/views.py call methods in LANG_drill/game.py, which creates LANG_drill/forms.py classes. This pattern is found throughout the rest of the game types. The Views classes also handle POST/GET responses and requests, the Forms classes handle the creation of form objects and their validation, and the Game classes handle evaluation of whether answers are correct or not as well as the selection of words or numbers from the FSTs.

There is room for some decoupling of functionality in LANG_drill/forms.py and LANG_drill/game.py as development progresses.

Common for all Numra-games:
- settings.py: FST_DIRECTORY, L1

Cardinals & Ordinals
- forms.py
  - … (relevant classes for this program)
- game.py
  - NumGame (see docstrings for method information)
- views.py
  - ...

Klokka
- forms.py:
- game.py: Klokka (subclass of NumGame)
  - Subclass contains filenames of the question/answer FSTs, and depends on settings.FST_DIRECTORY
- Subclass of NumGame, results in need to only override NumGame.get_db_info and NumGame.create_form

Dato
- forms.py
  - class DatoSettings(KlokkaSettings)
  - class DatoQuestion(KlokkaQuestion)
- game.py
  - class Dato(Klokka)
  - only needs to override Klokka.get_db_info, simple class.
  - def get_db_info(self, db_info)
- views.py
  - within class Numview(Gameview):
  - def num_date

The files are the following:

templates/LANG_oahpa_main.html
	templates/LANG_oahpa.html
		LANG_drill/templates/game.html
			LANG_drill/templates/num.html
				LANG_drill/templates/num_ord.html
				LANG_drill/templates/clock.html
				LANG_drill/templates/dato.html

Documentation of the content production

Documentation for linguists defeloping Numra content

Documentation on the interplay between fsts and the django code

Generating questions

generate a question and the correct answer(s) (create_form)
1. - hours = random(0..23)
2. - minutes =
  1. random(0, 30) (level=easy)
  2. random(0, 15, 30, 45) (level=medium)
  3. random(0 .. 59) (level=hard)
3. - "hours: minutes" are piped to lookup transcriptor-clock2text-desc.xfst > norm_list
4. - create a KlokkaQuestion object where
  1. question = hours: minutes
  2. acceptable answers = norm_list
  3. correct answer(s) = norm_list (in the new setup this should be the output of the facit-gen-fst which generates exactly one variant)

Analysing answers

analyse the answer (check_answer)
1. - user's answer is piped to lookup transcriptor-text2clock-desc.xfst > num_list
2. - if there is a (textual time, numeric time) pair in num_list such that user's answer = textual time and question = numeric time then accept the user's answer, otherwise reject

Note that the facit-gen-fst is still not made.

The different types of Numra

There are four different subprograms, each with its transducers in the opt catalogue for the language in question. They are called in the LANG_drill/game.py file.

The transducers

Numra cardinal and Numra ordinal

class NumGame(Game):

generate_fst = 'transcriptor-numbers-digit2text.filtered.lookup.xfst'
answers_fst = 'transcriptor-numbers-text2digits.filtered.lookup.xfst'

Numra clock

class Klokka(NumGame):

generate_fst = 'transcriptor-clock-digit2text.filtered.lookup.xfst'
answers_fst = 'transcriptor-clock-text2digit.filtered.lookup.xfst'

Numra date

class Dato(Klokka):

generate_fst = 'transcriptor-date-digit2text.filtered.lookup.xfst'
answers_fst = 'transcriptor-date-text2digit.filtered.lookup.xfst'

Numra money

class Dato(Money) -- perhaps? -- check the class name:

generate_fst = 'transcriptor-money-digit2text.filtered.lookup.xfst'
answers_fst = 'transcriptor-money-text2digit.filtered.lookup.xfst'

The source files

The source files are in $LANG/src/transcriptions/*.lexc.

Documenting +Use/NG and +Use/NA

+Use/NG = not-generate, for ped generation isme-ped.fst
+Use/NA = not-analyse, for restricting analyses needed for MT generation not to pop up elsewhere

Thus, ignore the +Use/NA tag for Oahpa/Numra purposes. Use +Use/NG for forms you want to accept, but not present.

The +Use/NG (and +Use/NA) are handled in the file LANG/am-shared/src-transcriptions-dir-include.am, called by the Makefile.am in the transcriptions folder.