Meeting_2007-09-17
Contents:
- Meeting setup
- Agenda
- 1. Opening, agenda review, participants
- 2. Updated task status since last meeting
- 3. Documentation
- 4. Corpus gathering
- 5. Corpus infrastructure
- 6. Infrastructure
- 7. Linguistics
- 8. Name lexicon infrastructure
- 9. Spellers
- 10. Other
- 11. Next meeting, closing
- Appendix - task lists for the next week
Meeting setup
- Date: 17.9.2007
- Time: 09.30 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat/Skype
Agenda
- Opening, agenda review
- Reviewing the task list from last week
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Infrastructure
- Linguistics
- name lexicon infrastructure
- Spellers
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 09: 37.
Present: Børre, Ilona, Sjur, Thomas, Tomi
Absent: Per-Eric, Trond
Agenda accepted as is.
2. Updated task status since last meeting
Børre
- move Steinar's error markup in the xml files to (a copy of) the original
- working
- working
- add semi-automatic updates of fixed and open issues to README files
- not done
- not done
- fix bugs!
Ilona
- lexicalise missing words
- Well, an endless work. Started looking at a missing-list made from all Aššu
- Well, an endless work. Started looking at a missing-list made from all Aššu
- add sme names from FIN
- Done
- Done
- make smn propernoun-list
- Done.
- Done.
- There are still sms-names that should be maybe added somewhere.
- Not done, yet. - They should be added in the same way as smn names (I
- Not done, yet. - They should be added in the same way as smn names (I
Maaren
- lexicalise actio compounds
Per-Eric
- expand the smj typos list
- add missing smj words
- lexicalise words from the Olavi missing list
Saara
- add new XSL/XML headers for proofing test docs
- not done
- not done
- Set up ways of adding meta-information for proofing correct corpus docs
- not done
- not done
- add correct type differentiation to XSL processing - bug 504
- not done
- not done
- other:
- fixed/extended speller test result processing to cope with the regression
- fixed/extended speller test result processing to cope with the regression
Sjur
- document the AppleScript testing tool
- not done
- not done
- document the testing procedures
- not done
- not done
- publish corpus contracts and project infra as open-source on NoDaLi-sta
- not done
- not done
- fix stuorra-oslolaš lower case o
- not done
- not done
-
ä/æ in smj speller
- not done
- not done
- work on the XML name editor/risten.no integration
- not done
- not done
- plan the rest of the project period
- not finished
- not finished
-
fix bugs!
- constantly reviewing the list
- constantly reviewing the list
- other:
- collected data for regression testing
- added a fourth (and last) type of testing - regression testing
- some streamlining of the Makefile re speller testing - more to be done
- collected data for regression testing
Thomas
- fix stuorra-oslolaš lower case o
- not done
- not done
-
ä/æ in smj speller
- not done
- not done
-
fix bugs!
- worked with some
Tomi
- make PLX conversion test sample; add conversion testing to the make file
- not done
- not done
- add Hunspell data generation/conversion
- not done
- not done
- fix bug 484
- not fixed
- not fixed
- add correct type differentiation to ccat - bug 505
- not done
- not done
-
fix bugs!
- fixed other bugs
Trond
- update the smj proper noun lexicon, and refine the morphological
- fix stuorra-oslolaš lower case o
- add sma texts to the corpus repository
-
ä/æ in smj speller
- fix bugs!.
3. Documentation
We want to automatise as much as possible when releasing new public betas.
TODO:
- add semi-automatic updates of fixed and open issues to README files
- not yet
4. Corpus gathering
Nothing new.
TODO:
- add sma Bible texts to the corpus repository (Trond)
- add correct type differentiation to XSL processing - bug 504 (Saara)
- add correct type differentiation to ccat - bug 505 (Tomi)
5. Corpus infrastructure
Nothing.
6. Infrastructure
Nothing.
7. Linguistics
North Sámi
TODO:
- lexicalise actio compounds. Example: vuolggasadji vs. vuolginsadji
- fix stuorra-oslolaš lower case o ( Sjur, Thomas, Trond)
- add to Bugzilla (Sjur)
- add to Bugzilla (Sjur)
- add the sme place names from Finland (Ilona)
- done! Well, there are some names, that Ilona couldn't do anything about.
- done! Well, there are some names, that Ilona couldn't do anything about.
Lule Sámi
smj propernoun bug issue:
- convert from common base (which means sme base)
- Words not convertable should be added to separate smj lexicon, and words that
- Words not convertable should be added to separate smj lexicon, and words that
- send to smj morphology
The original todo was to correct the smj morphology.
- conversion errors
- words that should not have been converten
- missing smj-unique names
- errors in the morphology
Testing procedures:
- analyse baseforms (as for sme)
- generate a couple of caseforms from the baseforms, and inspect result
Suggestion:
TODO:
- refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
-
ä/æ in speller, see bug report #411 (Tomi, Sjur)
- lexicalise words from the Olavi missing list, but check against the pdf
8. Name lexicon infrastructure
This sub-project needs to get up and running soon. Mainly Sjur's task.
Decisions made in Tromsø can be found in this meeting memo.
TODO:
- fix bugs in lexc2xml; add comments to the log element (Saara)
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
- implement data synchronisation between risten.no and
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
- merge placenames which are errouneously in different entries: e.g. Helsinki,
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
9. Spellers
OOo spellers
Børre will try to help out on this, as there are quite a few existing bugs
TODO:
- add Hunspell data conversion (Tomi, Børre)
Testing
Spelling Error Markup
TODO:
- Set up ways of adding meta-information (source info, used in testing or not,
- move Steinar's error markup in the xml files to (a copy of) the original
Automated testing
We need one more baseform test, one in which all baseforms are run through the
cat gt/doc/proof/spelling/testing/selftest-pl-sme-20070909.xml | grep \ '<original>' | cut -d">" -f2 | cut -d"<" -f1 | lookup -flags mbTT \ -utf8 gt/sme/bin/sme.fst | grep '\?' | cut -f1 | wc -l
Lex test of the lule-specific words. Three were not recognised:
Sálatduottar Sálatduottar +? Várjjat Várjjat +? Fatjatj Fatjatj +?
Abbreviations are currectly printed twice if they should be followed by a full
TODO:
- document the AppleScript testing tool (Sjur)
- document the testing procedures (Sjur)
- add baseform transducer test (Sjur)
Lexicon conversion to the PLX format
Several smj bugs discovered during last week, and added to Bugzilla: 495, 503,
Clitics:
+Clt+ge:#ge ENDLEX ; +Clt+ge:#k ENDLEX ; +Clt+gen:#gen ENDLEX ; +Clt+ga:#ga ENDLEX ;
The first two are variants of the same clitic, where the variation is governed
The two other ones are separate clitics, and do not vary according to any rules.
TODO:
- fix PLX-related bugs (Tomi)
- find a solution for smj clitics (Tomi)
New public beta
Delayed till the majority of the present bugs are fixed. - We will evaluate the
10. Other
Corpus contracts
Delayed till after final release.
TODO:
- publish corpus contracts and project infra as open-source on NoDaLi-sta
Bug fixing
When fixing bugs, record the version number containing the fix in the Bugzilla
62 open Divvun/Disamb bugs (32 of these 56 are speller-related bugs,
Project meeting
We'll meet in September, 24-28, in Tromsø to work on the hardest remaining
Hotel rooms: Ilona sun-wed, Tomi sun-fri.
TODO:
- reserve meeting room (Thomas)
- reserve lunch mon-fri, invoice to SD (Børre)
- book hotel rooms (Sjur)
11. Next meeting, closing
The next meeting is 1.10.2007, 09: 30 Norwegian time.
The meeting was closed at 10: 49.
Appendix - task lists for the next week
Boerre
- move Steinar's error markup in the xml files to (a copy of) the original
- add semi-automatic updates of fixed and open issues to README files
- order lunch mon-fri for the next gathering in Tromsø, invoice to SD
- help Tomi with adding Hunspell data generation/conversion
- fix bugs!
Ilona
- lexicalise missing words
- make sms propernoun-list
- Change NIILLAS-names to ANAR or DUORTNUS.
Maaren
- lexicalise actio compounds
Per-Eric
- expand the smj typos list
- add missing smj words
- lexicalise words from the Olavi missing list
- finish with the compounding tags to adjectives
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
- add correct type differentiation to XSL processing - bug 504
Sjur
- document the AppleScript testing tool
- document the testing procedures
- add baseform transducer test
- fix stuorra-oslolaš lower case o - add it to Bugzilla
-
ä/æ in smj speller
- work on the XML name editor/risten.no integration
- plan the rest of the project period
- book hotel rooms for the next gathering in Tromsø
- fix bugs!
Thomas
- fix stuorra-oslolaš lower case o
-
ä/æ in smj speller
- reserve meeting room for the next gathering in Tromsø
- fix bugs!
Tomi
- make PLX conversion test sample; add conversion testing to the make file
- add Hunspell data generation/conversion
- fix PLX conversion bugs
- add correct type differentiation to ccat - bug 505
- find a solution for smj clitics
- fix bugs!
Trond
- update the smj proper noun lexicon, and refine the morphological
- fix stuorra-oslolaš lower case o
- add sma texts to the corpus repository
-
ä/æ in smj speller
- fix bugs!.