Plains Cree Disambiguation
This file gives an nverview of some still ad hoc solutions for disambiguation.
Prerequisites:
- vislcg3 installed A text corpus.
How to analyse
Plains Cree differs from the other languuages in not having an adjusted
cat misc/CORR_Dog_Biscuits.txt |preprocess|lookup src/analyser-gt-desc.xfst | lookup2cg | vislcg3 -g src/syntax/disambiguation.cg3
The string lookup src/analyser-gt-desc.xfst might be express as the alias ucrk.
Dog Biscuits does not use the ' symbol as a letter, so we may use preprocess.
cat misc/7C_Mary_Wells.txt |sed 's/\([.,:;‘’"]\)/ \1 /g;'|tr '[ ]' '\n'|\ grep -v '~$'|grep -v '^$'|lookup src/analyser-gt-desc.xfst | lookup2cg | vislcg3 -g src/syntax/disambiguation.cg3
Missing list
In order to make good analyses, we need the words of the text in the analyser,
cat misc/CORR_Dog_Biscuits.txt |preprocess|lookup src/analyser-gt-desc.xfst |grep '?'|sort|uniq -c|sort -nr|less cat misc/7C_Mary_Wells.txt |sed 's/\([.,:;‘’"]\)/ \1 /g;'|tr '[ ]' '\n'|\ grep -v '~$'|grep -v '^$'|lookup src/analyser-gt-desc.xfst |grep '?'|sort|uniq -c|sort -nr|less cat misc/PCT.txt|ucrk|grep '?'|sort|uniq -c|sort -nr|less
Strategies for disambiguation
Look at common ambiguity patterns in some texts.
- Grammar ambiguity
- Word ambiguity
To create similar statics, use the sum-cg.pl script (write sum-cg.pl --help
cat misc/CORR_Dog_Biscuits.txt |preprocess|lookup src/analyser-gt-desc.xfst | lookup2cg > xxdogbiscuits.multi sum-cg.pl --grammar xxdogbiscuits.multi | less
You may of course also take the disambiguated text as input, and use the sum-cg
vislcg3 rules
Operators:
- DELIMITERS :
- This will work as an on-the-fly sentence (disambiguation window) delimiter.
- This will work as an on-the-fly sentence (disambiguation window) delimiter.
- LIST KIN = "daughter" "mother" "father" "aunt" "uncle" ;
- LIST WORD = N ADJ V ADV NUM ;
- LIST NP-MOD = ADJ ADV ;
- LIST @Kinship = @Kinship ;
- SET NOT-NP-MOD = WORD - NP-MOD ;
- SELECT INFM IF (1 INF) ;
- Singles out a reading from the cohort that matches the target, and if all contextual tests are satisfied it will delete all other readings except the matched one.
- Singles out a reading from the cohort that matches the target, and if all contextual tests are satisfied it will delete all other readings except the matched one.
- REMOVE INFM IF (NOT 1 INF) ;
- Singles out a reading from the cohort that matches the target, and if all contextual tests are satisfied it will delete the mached reading.
- Singles out a reading from the cohort that matches the target, and if all contextual tests are satisfied it will delete the mached reading.
- SELECT N IF (*-1 ART BARRIER NOT-NP-MOD) ;
- SELECT N IF (*-1 ART CBARRIER V) ;
- REMOVE INF (NEGATE -1 N LINK -1 INF LINK -1 INFM ) ;
- MAP @Kinship TARGET KIN + N ;
- ADD
- Appends tag to matching readings. Will not block for adding further tags
- Appends tag to matching readings. Will not block for adding further tags
- MAP @3SERR TARGET (-3S)(-1 (S NOM) OR (3S NOM)) ;
- Appends tags to matching readings, and blocks other MAP, ADD
- Appends tags to matching readings, and blocks other MAP, ADD
- SUBSTITUTE (N S NOM) (N S Nom) N ;
- Replaces the tags in the first list with the tags in the second list.
Careful mode:
- 1C NOM
- if the only reading left is NOM
Apply function tags
Make a set for the function tag, and make one or more rules:
- LIST @SUBJ = @SUBJ ;
- MAP @SUBJ TARGET NOM IF (1 VFIN) ;
- LIST @SUBJ> = @SUBJ> ;
- MAP @SUBJ> TARGET NOM IF (1 VFIN) ;
- with arrow towards the finite verb
Aliases
Put these in your .profile or .bashrc folder
alias crkdep="sent-proc.sh -l crk -s dep" alias crkdept="sent-proc.sh -l crk -s dep -t" alias crkdis="sent-proc.sh -l crk -s dis" alias crkdist="sent-proc.sh -l crk -s dis -t"