Numra

This page contains documentation for programmers. See also the Numra for developers page, which contains documentations for linguists making the automata for the actual languages.

The files are, as always, in ped/LANG_oahpa/, except for North Sámi where the the prefix univ is used instead of sme.

Intro

Numra differs from Morfa in not using the MySQL database, but using transducers directly.

The transducers behind numra are stored in $GTHOME/langs, and are the following (where $LANG is the iso code for your language):

  • Cardinal and ordinal numbers: $LANG/src/transcriptions/transcriptor-numbers-digit2text.lexc
  • Clock: $LANG/src/transcriptions/transcriptor-clock-digit2text.lexc
  • Date: $LANG/src/transcriptions/transcriptor-date-digit2text.lexc

Note that the order of writing in the lexc file should be as indicated in the filenames, thus one should write 2:two ( digit2text) rather than two:2, as it is (wrongly) written for some of the languages.

The files governing the four programs are num.html, num_ord.html, clock.html, date.html. These files are found here:

  • for sme: ped/univ_oahpa/univ_drill/templates/,
  • for sma: ped/smaoahpa/smadrill/templates/
  • for all other languages: ped/LANG_oahpa/LANG_drill/templates/, where
    • LANG = the iso code of the language in question

These files refer to the fsts on gtoahpa and gtlab, in the /opt/smi/ catalogue there. Oahpa versions with an url testing.oahpa.no are on gtlab, versions with url oahpa.no are on gtoahpa.

Debugging lexc inconsistency

Unfortunately, the Numra transducers were from the beginning written in two different directions, as referred to above. Instead of correcting this, this inconcistency was projected over to new languages, and into the Oahpa code. During the Oahpa newinfra project this should be fixed, in the following way:

  1. in $GTHOME/langs/LANG/src/transcriptors, the files should get intended filenames DONE
  2. The lexc content should be harmonised to the 2:two pattern PARTLY DONE
  3. The same filename should be used in /opt/LANG/ PARTLY DONE
  4. The same filename (and 2:two pattern) should be used in the Oahpa code PARTLY DONE

The command (in ped/)

grep -A1 'class NumGame' *_oahpa/*_drill/game.py

reveals that file references for appr. half of the Numras have been updated. As a quick fix for several languages, the content of the correct filenames have been copied over to the old ones in /opt/smi/LANG/bin on gtoahpa (TT).

To sum up: Such file names are in use in /opt/smi/ and the Numra code:

  • WRONG: smn-num.fst
  • WRONG: transcriptor-numbers2text-desc.xfst
  • Correct: transcriptor-numbers-digit2text.filtered.lookup.xfst

Class inheritance structure

The class inheritance structure for the three Numra modes Numra, Klokka and Dato is the same in both the classes in LANG_drill/game.py and LANG_drill/forms.py, with the classes getting progressively simpler, where Dato is the simplest. (Note: Exchange "LANG" with the relevant language code, note that for sma it is smadrill and for sme it is univ_drill)

Overall, methods and classes in LANG_drill/views.py call methods in LANG_drill/game.py, which creates LANG_drill/forms.py classes. This pattern is found throughout the rest of the game types. The Views classes also handle POST/GET responses and requests, the Forms classes handle the creation of form objects and their validation, and the Game classes handle evaluation of whether answers are correct or not as well as the selection of words or numbers from the FSTs.

There is room for some decoupling of functionality in LANG_drill/forms.py and LANG_drill/game.py as development progresses.

  • Common for all Numra-games:
    • settings.py: FST_DIRECTORY, L1
    • Cardinals & Ordinals
      • forms.py
        • … (relevant classes for this program)
      • game.py
        • NumGame (see docstrings for method information)
      • views.py
        • ...
    • Klokka
      • forms.py:
      • game.py: Klokka (subclass of NumGame)
        • Subclass contains filenames of the question/answer FSTs, and depends on settings.FST_DIRECTORY
      • Subclass of NumGame, results in need to only override NumGame.get_db_info and NumGame.create_form
    • Dato
      • forms.py
        • class DatoSettings(KlokkaSettings)
        • class DatoQuestion(KlokkaQuestion)
      • game.py
        • class Dato(Klokka)
        • only needs to override Klokka.get_db_info, simple class.
        • def get_db_info(self, db_info)
      • views.py
        • within class Numview(Gameview):
        • def num_date

      The files are the following:

      templates/LANG_oahpa_main.html
      	templates/LANG_oahpa.html
      		LANG_drill/templates/game.html
      			LANG_drill/templates/num.html
      				LANG_drill/templates/num_ord.html
      				LANG_drill/templates/clock.html
      				LANG_drill/templates/dato.html
      

      Documentation of the content production

      Documentation on the interplay between fsts and the django code

      Generating questions

      1. generate a question and the correct answer(s) (create_form)
        1. - hours = random(0..23)
        2. - minutes =
          1. random(0, 30) (level=easy)
          2. random(0, 15, 30, 45) (level=medium)
          3. random(0 .. 59) (level=hard)
        3. - "hours: minutes" are piped to lookup transcriptor-clock2text-desc.xfst > norm_list
        4. - create a KlokkaQuestion object where
          1. question = hours: minutes
          2. acceptable answers = norm_list
          3. correct answer(s) = norm_list (in the new setup this should be the output of the facit-gen-fst which generates exactly one variant)

      Analysing answers

      1. analyse the answer (check_answer)
        1. - user's answer is piped to lookup transcriptor-text2clock-desc.xfst > num_list
        2. - if there is a (textual time, numeric time) pair in num_list such that user's answer = textual time and question = numeric time then accept the user's answer, otherwise reject

      Note that the facit-gen-fst is still not made.

      The different types of Numra

      There are four different subprograms, each with its transducers in the opt catalogue for the language in question. They are called in the LANG_drill/game.py file.

      The transducers

      Numra cardinal and Numra ordinal

      class NumGame(Game):

      generate_fst = 'transcriptor-numbers-digit2text.filtered.lookup.xfst'
      answers_fst = 'transcriptor-numbers-text2digits.filtered.lookup.xfst' 
      

      Numra clock

      class Klokka(NumGame):

      generate_fst = 'transcriptor-clock-digit2text.filtered.lookup.xfst'
      answers_fst = 'transcriptor-clock-text2digit.filtered.lookup.xfst'        
      

      Numra date

      class Dato(Klokka):

      generate_fst = 'transcriptor-date-digit2text.filtered.lookup.xfst'
      answers_fst = 'transcriptor-date-text2digit.filtered.lookup.xfst'
      

      Numra money

      class Dato(Money) -- perhaps? -- check the class name:

      generate_fst = 'transcriptor-money-digit2text.filtered.lookup.xfst'
      answers_fst = 'transcriptor-money-text2digit.filtered.lookup.xfst'
      

      The source files

      The source files are in $LANG/src/transcriptions/*.lexc.

      Documenting +Use/NG and +Use/NA

      • +Use/NG = not-generate, for ped generation isme-ped.fst
      • +Use/NA = not-analyse, for restricting analyses needed for MT generation not to pop up elsewhere

      Thus, ignore the +Use/NA tag for Oahpa/Numra purposes. Use +Use/NG for forms you want to accept, but not present.

      The +Use/NG (and +Use/NA) are handled in the file LANG/am-shared/src-transcriptions-dir-include.am, called by the Makefile.am in the transcriptions folder.