How to evaluate the output of the FST, as a part of the developing work

Here are some tips for the linguist, all commands are based on working in the crk-catalogue

Make an alias to the crk-catalog

Add this to your .bashrc or .profile:

alias crk="pushd ~/main/langs/crk" 

Open a new terminal window: In any catalogue, write 'crk' ENTER.

Test that all noun lemmas are possible to generate as base form

sh test/data/

Look at fst-output

A certain form for all lemmas

The example is for animate nouns, all the continuation lexicons for these nouns start with AN

grep ' AN' src/fst/stems/nouns.lexc |\
egrep -v '^(\!|LEX)' | tr ":" " " | cut -d " " -f1 |\
sed 's/$/+N+AN+Loc/' | dcrk | less

The paradigm for a certain lemma

The example is for an inanimate noun, but one can use any of the paradigms in the test/data-catalogue

cat test/data/NI-par.txt | sed 's/^/ôtênaw/' | dcrk | less

Run only one yaml-test

Remove all yamltests (check in your local modifications first!):

rm test/src/gt-norm-yamls/*

Get the yaml-file you want to test, e.g.:

svn up test/src/gt-norm-yamls/V-mato_gt-norm.yaml
make check

Compare the lingvistic output of all yaml-tests for a certain PoS

Remove all yamltests (check in your local modifications first!) - the example is for verbs:

rm test/src/gt-norm-yamls/*

Get the yaml-file you want to test, e.g.:

svn up test/src/gt-norm-yamls/U-all_gt-norm.yaml

make check

Make/update all yaml-tests in one for a certain PoS (and a certain pattern?)

This example is adding all verbs into one file:

head -11 test/src/gt-norm-yamls/V-AI-matow_gt-norm.yaml > test/src/gt-norm-yamls/U-all_gt-norm.yaml
tail +11 test/src/gt-norm-yamls/V* | grep -v "==" >> test/src/gt-norm-yamls/U-all_gt-norm.yaml

This example is adding all nouns with final -y into one file:

head -11 test/src/gt-norm-yamls/N-AN-amisk_gt-norm.yaml > test/src/gt-norm-yamls/A-Ny-all_gt-norm.yaml 
tail +11 test/src/gt-norm-yamls/N*y_gt-norm.yaml | grep -v "==" >>  test/src/gt-norm-yamls/A-Ny-all_gt-norm.yaml

Make a new yaml-file

The example is for the inanimate noun ôtênaw. Use an already functioning yaml-file as a starting point (here N-AN-amiskw_gt-norm.yaml). You still have to do a little editing afterwords, like correcting the docu about the lemma, and making it more readable by adding empty lines. And you must of course correct the output.

head -12 test/src/gt-norm-yamls/N-AN-amisk_gt-norm.yaml\
> test/src/gt-norm-yamls/N-IN-otenaw_gt-norm.yaml

cat test/data/NI-par.txt | sed 's/^/ôtênaw/' | dcrk |\
tr '\t' ':' | sed 's/:/: /' | grep -v '^$' |\
sed 's/^/     /' >> test/src/gt-norm-yamls/N-IN-otenaw_gt-norm.yaml

Comment: The last sed-command should give 5 whitespaces

Documentation on unix-commands