Presently we have three types of morphology testing:
These will briefly be presented here, with instructions on how to adapt or
Included from the und
template there is a simple shell script to test lemma src/morphology/stems/
dir), and try to generate the lemma.
In practice it is a bit more complicated, and the script may also need some
The adaption is basically that one needs to check that the tag string used for TESTS
test/src/morphology/Makefile.am
).
Complicating factors might be that some nouns do not inflect in singular (the
The template only gives noun lemma generation, but it is easy to use that script
Note that this setup does not work for languages with gender systems, dividing
The most widely used morphological testing are the Yaml tests. The data format
Config: hfst: Gen: ../../../src/generator-gt-norm.hfst Morph: ../../../src/analyser-gt-norm.hfst xerox: Gen: ../../../src/generator-gt-norm.xfst Morph: ../../../src/analyser-gt-norm.xfst App: lookup Tests: Noun - atim - ok : # -m animate noun atim+N+AN+Sg: atim # this is a comment atim+N+AN+Pl: atimwak # test atim+N+AN+Loc: atimohk # really rare form
The yaml syntax is simple, but relies on indenting: two spaces for each level of
The header is started by the keyword Config
, and lists fst's to be used for test/src/morphology/
.
The test data is similarly started by the keyword Tests
, followed by a line Nound - atim - ok
in the example above). analysis string
followed by colon, followed by wordform string
. If there are more than one possible wordform, they are all
ненэцьʼ+N+Sg+Loc: [ненэцяӈгана, ненэцяӈгна]
Remember to always indent properly!
Sometimes it can be valuable to specify negative tests. Usually they should
To specify a negative test, add a tilde in front of the word form in the Yaml
gierehtse+N+Sg+Acc: [gierehtsem, ~gieriehtsem]
Now the Yaml test will only pass if the last word form given is NOT generated,
The filenames for the yaml tests are built up with the following components:
.ana
or .gen
specifier .yaml
The underscore is the separator between the "free" part and the fst specifier.
By specifying .ana
or .gen
before the .yaml
suffix, only
It is also possible, and often a very good idea, to add test cases directly to
!!€gt-norm: adjectives !!€ isvelihks: isvelihks+A+Attr !!€ isveligs: isvelihks+A+Attr !!€ isvelihks: isvelihks+A+Sg+Nom !!€ isveligs: isvelihks+A+Sg+Nom
The first line specifies which transducer to run the test data against, followed !!€gt-norm:
is gt-norm
with another fst specifier if you want
The rest of the lines specify the test data, one line per word form, in two
Positive tests are specified with the string !!€
at the very beginning !!$
at
! Test data: !!€gt-norm: gierehtse # Odd-syllable test !!€ gierehtse: gierehtse+N+Sg+Nom !!€ gierehtsen: gierehtse+N+Sg+Gen !!€ gieriehtsasse: gierehtse+N+Sg+Ill !!€ gierehtsem: gierehtse+N+Sg+Acc !!$ gieriehtsem: gierehtse+N+Sg+Acc ! Block diphthongues in odd syll
Note the last line, where we explicitly check that the illegal word form gieriehtsem
is never generated or accepted.
Note that there must be a space between !!€
or !!$
and the following
It is ok to have LexC style comments after the second column, as shown in the
NB! Possible pitfal: due to the way the parsed test data is stored internally
In some cases you may want to run the tests in only one direction: only analysis
!!€dict-gt-norm.gen: # Even-syllable test, generation only !!€ raattâđ: raattâđ+V+Inf
The dict-gt-norm
fst is only used for generation (the dictionary analyser.gen
to the fst name. If you need to run certain tests only for the analyser, .ana
to the fst name just before the colon.