Documentation on North Saami
Source file documentation
Using the analysers
- In the terminal: analyse words by writing usme, generate with dsme
- Generation of: paradigms / text /
- For more info, see How to use the morphological parsers
Projects involving North Saami
Tags used for analysis
Discussions on improving our linguistic analysis
Morphophonology, morphology and syntax
- Documentation of the twol-sme.txt rule file
- Documentation of the lexicon files
- The use of flag diacritics
- Partly obsolete Documentation of the disambiguation file
- Syntax regression testing: run sh test/src/syntax/disambiguation_developertest.sh (you may eventually have to adjust the path following $GTBIG, the files are in $GTBIG/gt/sme/corp)
- See also the general disambiguation page.
Pre- and postprocessing
-
Documentation of the preprocessing of running text
- The perl-based preprocess script, our current preprocessor
- For reference: Documentation of the old xfst-based preprocessor tok.txt is found here
- Documentation of inituppercase.regex, (initial capitalisation) and allcaps.xfst, the file for words written in all-caps. Note: The latter is presently not in use.
- Translating from xerox-style to vislcg3-style is done with the script lookup2cg
Normativity issues
Speller optimisations
There is a separate page on speller optimisations for SME.
Obsolete test reports, for reference
- A test plan for sme (obsolete)
- A test diary for sme (obsolete)
- Bug report sheet from the days before we got a bug report system) (obsolete)
- Our earlier treatment of foreign words (obsolete)