Ideas For Newinfra Testing
Test ideas
We want to test our transducers betters
Existing tests
- Paradgim testing against predefined answers: yaml tests
- Tests written in the lexc and twolc code
- Testing whether we generate the lemma or not
- Tests using the lemma list as gold standard (do we generate the lemma)
Ideas for new tests
- Test for Multichar Symbols on the lower side
- Test for phonotactically illegal strings
Elaborating the test ideas
Test for Multichar Symbols on the lower side
Now and then Multichar Symbols slip through twolc and give "words" like
How to test for this:
- Read the set of multichar symbols from root.lexc
- Make a transducer multichar.fst of them
- Compose LANG.fst .o. multichar.fst in xfst
- list the result (should be empty)
This test one should be able to set up language-independently.
Test for phonotactically illegal strings
Example, from fkv (this must be adjusted to a script):
- Make a regex accepting strings in Vowel + Vowel + e:
regex [ ?* [a|e|i|o|u|ä|ö] [a|e|i|o|u|ä|ö] e ] ;
- Compose it with the main fst:
xfst -e "regex @\"src/analyser-gt-desc.xfst\" .o. @\"VVe.fst\" ; "
- print the result with xfst
xfst[1]: print lower-words > lower-words.txt