Test diary
Test results for the morphology and lexicon files
This document documents the testing of the parser and disambiguator. Background info and test plan is found in the test plan document.What is found here is an overview of what has been tested, both vocabulary testing, testing of the disambiguator, and testing of the morphological analysis.
Test results for morphology and lexicon
Vocabualry testing
The following table records recall for word forms in various texts. Here we measure coverage of the vocabulary, by recording all word forms that are not recognised.
--------------------------------------------------- zcorp/gt/sme/facta/AGOR-15.06.05_2.doc.xml Test 9 Wftot Wf-tkn %-recall Tytot Wf-typ %-recall 051204 40030 38880 96.9 % 6974 6221 89,2 % minaigiweb-050824.txt Test 8 Wftot Wf-tkn %-recall Tytot Wf-typ %-recall 050916 276798 255144 92.2 % 28480 22176 77.9 % Very many formatting errors. callinravvagat 2000.xml Test 7 Wftot Wf-tkn %-recall Tytot Wf-typ %-recall 050518 26948 25168 93.4 % 6686 5435 81.3 % w/o caps on 050518 26948 25208 93.5 % 6686 5455 81.6 % w caps on --------------------------------------------------- nisson_ovddasteapmi.txt Test 6 Wftot Wf-tkn %-recall Tytot Wf-typ %-recall 040903 38360 35704 93.0 % 20660 19102 92.4 % --------------------------------------------------- hjh-nod1iid.txt Test 5 Wftot Wf-tkn %-recall Tytot Wf-typ %-recall 040903 1580 1532 96.7 % 683 636 93.1 % --------------------------------------------------- sd-divas-2002-{1,2}.txt Test 4 Wftot Wf-tkn %-recall Tytot Wf-typ %-recall 041005 32835 31834 96.9 % 6664 6054 90.8 % 040913 32883 31255 95.0 % 6759 5856 86.6 % --------------------------------------------------- sd-divas-2001-1.txt Test 4 Wftot Wf-tkn %-recall Tytot Wf-typ %-recall 040903 60522 58549 96.7 % 8610 7610 88.4 % 040329 62459 60159 95.3 % 8496 7406 87.2 % --------------------------------------------------- handlingsplan_samisk.txt Test 3 Wftot Wf-tkn %-recall Tytot Wf-typ %-recall 031120 2148 2053 95,6 % 1044 984 94.3 % 040329 2461 2389 97.1 % 955 898 94.0 % (new preprocessor) --------------------------------------------------- Test 2 Wftot Wf-tkn %-recall Tytot Wf-typ %-recall Collection 225355 32467 (test closed) 020815 203080 90.1 % 22721 70.0 % 020918 204315 90.7 % 22956 70.7 % 030210 227062~214845 94.6 % 31474~24398 77.5 % --------------------------------------------------- Test 1 Wftot Wf-tkn %-recall Wf-types %-recall New Testament 139681 14888 (test closed) 011110 36471 26.1 % 4983 33.5 % 011116 36980 26.5 % 5050 33.9 % 011214 37736 27.0 % 5177 34.8 % 011218 40741 29.2 % 5955 40.0 % (closed classes added) 020129 126765 90.6 % 11676 78.4 % (proper names added) 020205 128702 92.1 % 12340 82.9 % 020206 129857 92.9 % 12328 82.8 % (nom_nom compound) 020207 131846 94.4 % 12500 84.0 % 020212 132394 94.8 % 12621 84.8 % 020213 132878 95.1 % 12652 85.0 % 020217 132993 95.2 % 12674 85.1 % 020306 133791 95.8 % 12850 86.4 % 020307 133821 95.8 % 12878 86.5 % 020318 134042 95.9 % 12914 86.7 % 020321 135446 97.0 % 13292 89.3 % 020323 136120 97.5 % 13373 89.8 % 020404 136621 97.8 % 13524 90.8 % 020410 136974 98.1 % 13609 91.4 % 020417 137435 98.4 % 13762 92.4 % 020418 137977 98.8 % 13875 93.2 % 020423 138101 98.9 % 13964 93.8 % 021104 138254 99.0 % 14003 94.1 % 040216 166194 165330 99.5 % 14916 14298 95.9 % 050726 166191 165902 99.8 % 14920 14666 98.3 % ---------------------------------------------------
Explaining the table
Lower token than type percentage indicates that the parser misses common words more often than seldom ones.
Lower type than token percentage (which is the case) indicates that the parser is good at the core vocabulary, but has
Each text is given a separate section in the table, ordered chronologically, with the oldest test case (Test 1) at the bottom. The first line of each section gives the name of the file (note: the files of the test cases 2 and 3 are so changed that these two test cases are closed). Each line represents a test run. The first colum gives the test date (in the format ddmmyy), the second (WFtot) the total number of words in the file question, the third (Wf-tkn) the number of recognised word form tokens, and the percentage compared to the total. The next columns does the same for wordform types (cf. below for the commands used to calculate the numbers).
------------------------------------------------------------------------- Wftot: cat filename | preprocess --abbr=bin/abbr.txt | wc -l Non_recognised_wf: cat filename | preprocess --abbr=bin/abbr.txt | lookup -flags mbTT -utf8 bin/sme.fst | grep '\?' | grep -v CLB | wc -l Wf-tkn = Wftot - Non_recognised_wf %-recall = Wf-tkn * 100 / Wftot ------------------------------------------------------------------------- Tytot (Total number of wordform types): cat filename | preprocess --abbr=bin/abbr.txt | sort | uniq | wc -l Non_recognised_wt (Number of non-analysed wordform types: cat filename | preprocess --abbr=bin/abbr.txt | sort | uniq | lookup -flags mbTT -utf8 bin/sme.fst | grep '\?' | grep -v CLB | wc -l Wf-typ (Number of recognised wordform types) Wf-typ = Tytot - Non_recognised_wt %-recall = Wf-typ * 100 / Tytot If the text is taken from our new /usr/local/share/sme/gt corpus, then the "cat filename" part should be replaced with catxml --title --input /usr/local/share/sme/gt/ and thereafter catalogue name and file name. -------------------------------------------------------------------------
Grammatical testing
February 2005: Here we will fill in an overview of which grammatical paradigms and wich sublexica we have tested.
Part of speech testing
Adjectives
May 05. Adjectives are tested, both morphologically and lexically.
Nouns
May 05: Nominal morphology is tested
Verbs
May 05: Verbs are about to be gone through
Testing the disambiguator
Test sentences
Here we test accuracy, precision and recall
Accuracy # Here we list tests that are not manually checked. Date Tokens Parses Ambiguity Type 051204 40753 Syntax Date Tokens Parses CorrTag Ambiguity Precision Recall Text Type 050429 9219 9736 9092 1.056 0.93 0.99 Nickel Grammar 050512 1972 2089 1928 1.059 0.92 0.98 Luossa Traditional 050719 9179 14749 9044 1.607 0.61 0.98 Nickel Grammar 051025 8770 11498 1.311 Hálddášanáššiid Admin 051027 2578 2686 2498 1.042 0.93 0.97 TaleE036.txt Admin 070207 2059 2106 1973 1.023 0.94 0.96 MinÁigi070202 news Morphology Date Tokens Parses CorrTag Ambiguity Precision Recall Text Type 050429 9219 9736 9152 1.056 0.94 0.99 Nickel Grammar 050512 1972 2089 1955 1.059 0.94 0.99 Luossa Traditional 050719 9179 14749 9100 1.607 0.62 0.99 Nickel Grammar 051025 8770 11498 1.311 Hálddášanáššiid Admin 051027 2578 2686 2519 1.042 0.94 0.98 TaleE036.txt Admin 070207 2059 2106 1999 1.023 0.95 0.97 MinÁigi070202 news Where: Tokens = Number of tokens in the text (cat file.txt | preprocess --abbr=bin/abbr.txt | wc -l) (or: "catxml -i" instead of "cat", the catxml command can only be used on victorio, on the files in /usr/local/share/corp/ ) Parses = Number of parses given (number of the following command, minus the number of tokens) (cat file.txt | preprocess --abbr=bin/abbr.txt | lookup -flags mbTT -utf8 bin/sme.fst | lookup2cg | vislcg --grammar=src/sme-dis.rle | wc -l CorrTag = Number of tokens that did not have their correct tag removed (This number must be manually arrived at: Tokens - errouneous_analyses Ambiguity = #Parses / #Tokens Precision = #CorrTag / #Parses Recall = #CorrTag / #Tokens Syntax = Tested wrt. the syntactic (@XXX) tags Morphology = Tested wrt. the morphological tags (all except the @XXX ones), but run with the MAPPING section activated (standard mode, that is)
Last modified $Date$, by $Author$