Meeting_2007-10-01
Contents:
- Meeting setup
- Agenda
- 1. Opening, agenda review, participants
- 2. Updated task status since last meeting
- 3. Documentation
- 4. Corpus gathering
- 5. Corpus infrastructure
- 6. Infrastructure
- 7. Linguistics
- 8. Name lexicon infrastructure
- 9. Spellers
- 10. Other
- 11. Next meeting, closing
- Appendix - task lists for the next week
Meeting setup
- Date: 1.10.2007 
- Time: 09.30 Norw. time 
- Place: Internet 
- Tools: SubEthaEdit, iChat/Skype
Agenda
- Opening, agenda review 
- Reviewing the task list from last week 
- Documentation - divvun.no 
- Corpus gathering 
- Corpus infrastructure 
- Infrastructure 
- Linguistics 
- name lexicon infrastructure 
- Spellers 
- Other issues 
- Summary, task lists 
- Closing
1. Opening, agenda review, participants
Opened at 09: 37.
Present: Børre, Per-Eric, Risten, Sjur, Thomas, Tomi, Trond
Absent: none
Agenda accepted as is.
2. Updated task status since last meeting
Børre
- move  Steinar's error markup in  the xml files to (a copy of) the original- not done 
 
- not done 
- add semi-automatic updates of fixed and open issues to README files - not done 
 
- not done 
- order lunch mon-fri for the next gathering in Tromsø, invoice to SD - done 
 
- done 
- help Tomi with adding Hunspell data generation/conversion - nouns and adjectives work halfway 
 
- nouns and adjectives work halfway 
- fix bugs!
Ilona
- lexicalise missing words - done 
 
- done 
- make  sms propernoun-list 
- Change NIILLAS-names to ANAR or DUORTNUS.
Maaren
- lexicalise actio compounds
Per-Eric
- expand the smj typos list - Worked and still working 
 
- Worked and still working 
- add missing smj words - Worked and still working 
 
- Worked and still working 
- lexicalise words from the Olavi missing list - Worked and still working 
 
- Worked and still working 
- finish with the compounding tags to adjectives - done
 
Saara
- add new XSL/XML headers for proofing test docs - not done 
 
- not done 
- Set up ways of adding meta-information for proofing correct corpus docs - still not done 
 
- still not done 
- add correct type differentiation to XSL processing - bug 504 - done
 
Sjur
- document the AppleScript testing tool - not finished 
 
- not finished 
- document the testing procedures - the procedures are still changing 
 
- the procedures are still changing 
- add baseform transducer test - done 
 
- done 
- fix stuorra-oslolaš lower case  o - add it to Bugzilla - added to Bugzilla 
 
- added to Bugzilla 
- 
ä/æ in  smj speller - done 
 
- done 
- work on the XML name editor/risten.no integration - not done 
 
- not done 
- plan the rest of the project period - roughly done 
 
- roughly done 
- book hotel rooms for the next gathering in Tromsø - done 
 
- done 
- 
fix bugs! 
- done a lot in Tromsø
 
Thomas
- fix stuorra-oslolaš lower case  o 
- not up to me any more 
 
- not up to me any more 
- 
ä/æ in  smj speller - fixed 
 
- fixed 
- reserve meeting room for the next gathering in Tromsø - done 
 
- done 
- 
fix bugs! 
- worked
 
Tomi
- make PLX conversion test sample; add conversion testing to the make file 
- add Hunspell data generation/conversion - Helped Børre with this one 
 
- Helped Børre with this one 
- fix PLX conversion bugs - fixed 
 
- fixed 
- add correct type differentiation to ccat - bug 505 - added 
 
- added 
- find a solution for  smj clitics - found one 
 
- found one 
- 
fix bugs! 
- fixed
 
Trond
- update the  smj proper noun lexicon, and refine the morphological - Discussed during meeting, not done. 
 
- Discussed during meeting, not done. 
- fix stuorra-oslolaš lower case  o 
- Hanging. 
 
- Hanging. 
- add  sma texts to the corpus repository 
- ' Not done 
- 
ä/æ in  smj speller - This one was fixed? - Yes 
 
- This one was fixed? - Yes 
- 
fix bugs!. - Fixed very many bugs indeed during meeting.
 
3. Documentation
TODO: 
- add semi-automatic updates of fixed and open issues to README files 
4. Corpus gathering
TODO: 
- add  sma Bible texts to the corpus repository (Trond) 
- add correct type differentiation to XSL processing - bug 504 (Saara) - done 
 
- done 
- add correct type differentiation to ccat - bug 505 (Tomi) - done - both needs testing 
 
- both needs testing 
 
- done 
- test correct-type markup with latest enhancements (Sjur)
5. Corpus infrastructure
Nothing.
6. Infrastructure
Speller testing is still fluctuating a bit.
7. Linguistics
North Sámi
Čorru > čorut *Oslolaš with hyphen required, is printed now, but shouldn't oslolaš - is done correctly now
TODO: 
- lexicalise actio compounds. Example: vuolggasadji vs.  vuolginsadji 
- fix stuorra-oslolaš lower case  o ( Tomi) - add to Bugzilla (Sjur) - done
 
 
- add to Bugzilla (Sjur) 
Lule Sámi
We have the same oslolaš derivation in smj too, but with another derivation.
Tjårro > tjårok *Oslolaš with hyphen required, is printed now, but shouldn't oslolaš - is done correctly now
smj propernoun bug issue: 
- convert from common base (which means sme base)- Words not convertable should be added to separate smj lexicon, and words that 
 
- Words not convertable should be added to separate smj lexicon, and words that 
- send to smj morphology
The original todo was to correct the smj morphology. 
- conversion errors 
- words that should not have been converten 
- missing smj-unique names 
- errors in the morphology
Testing procedures: 
- analyse baseforms (as for sme)
- generate a couple of caseforms from the baseforms, and inspect result
Suggestion: 
TODO: 
- refine  smj proper noun lexica, cf. the propernoun-smj-lex.txt - 
ä/æ in speller, see bug report #411 (Tomi, Sjur) - done 
 
- done 
- lexicalise words from the Olavi missing list, but check against the pdf - add  oslolaš type derivation test cases to the regresssion file - 
sme->smj lexicon conversion to build bilingual lexicon resources, and 
8. Name lexicon infrastructure
This sub-project needs to get up and running soon. Mainly Sjur's task.
Decisions made in Tromsø can be found in this meeting memo.
TODO: 
- fix bugs in lexc2xml; add comments to the log element (Saara) 
- finish first version of the editing (Sjur) 
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond) 
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as- convert propernoun-($lang)-lex.txt to a derived file from common xml files- implement data synchronisation between  risten.no and - start to use the xml file as source file 
- clean terms-sme.xml such that all names have the correct tag for their use - merge placenames which are errouneously in different entries: e.g. Helsinki, - publish the name lexicon on risten.no (Sjur) 
- add missing parallel names for placenames (linguists) 
- add informative links between first names like Niillas and Nils 
9. Spellers
OOo spellers
We have a first working demo!
TODO: 
- Hunspell lexicon conversion (Tomi, Børre) - working on it
 
Testing
Spelling Error Markup
TODO: 
- Set up ways of adding meta-information (source info, used in testing or not,- move  Steinar's error markup in the xml files to (a copy of) the original
Automated testing
The infrastructure still fluctuating a bit.
TODO: 
- document the AppleScript testing tool (Sjur) 
- document the testing procedures (Sjur) - started 
 
- started 
- add baseform transducer test (Sjur) - done
 
Lexicon conversion to the PLX format
All bugs fixed.
The latest changes to the Makefile should be reviewed after the lexicon bug we 
TODO: 
- fix PLX-related bugs (Tomi) - done, except for  oslolaš 
 
- done, except for  oslolaš 
- find a solution for  smj clitics (Tomi) - done 
 
- done 
- test whether we can revert Makefile changes, and if positive, revert them (Tomi)
New public beta
We are ready to deliver one now. What needs to be done:
TODO: 
- collect/build an e-mail notify list; we make it simple, a text document with - update list of fixed and known issues - Tuesday afternoon (Sjur, Risten) 
- update translations of README-files - Thursday afternoon - update installation packages (Sjur) 
- announce the beta (Sjur)
10. Other
Corpus contracts
Delayed till after final release.
TODO: 
- publish corpus contracts and project infra as open-source on NoDaLi-sta 
Bug fixing
When fixing bugs, record the version number containing the fix in the Bugzilla 
62 open Divvun/Disamb bugs (32 of these 56 are speller-related bugs, 
11. Next meeting, closing
The next meeting is 8.10.2007, 09: 30 Norwegian time.
The meeting was closed at 10: 26.
Appendix - task lists for the next week
Boerre
- move  Steinar's error markup in  the xml files to (a copy of) the original
- Hunspell lexicon conversion 
- collect/build an e-mail notify list 
- fix bugs!
Ilona
- lexicalise missing words 
- make  sms propernoun-list 
- Change NIILLAS-names to ANAR or DUORTNUS.
Maaren
- lexicalise actio compounds
Per-Eric
- expand the smj typos list 
- add missing smj words 
- lexicalise words from the Olavi missing list 
- fix bugs!
Risten
- fixed and open issues to README files 
- update translations of README-files - Thursday afternoon
Saara
- add new XSL/XML headers for proofing test docs 
- Set up ways of adding meta-information for proofing correct corpus docs 
Sjur
- document the AppleScript testing tool 
- document the testing procedures 
- work on the XML name editor/risten.no integration 
- fixed and open issues to README files 
- test correct-type markup with latest enhancements 
- collect/build an e-mail notify list 
- update translations of README-files - Thursday afternoon 
- update installation packages 
- announce the beta 
- fix bugs!
Thomas
- explain compound-tags to Tomi 
- add  oslolaš type derivation test cases to  smj regresssion file 
- 
sme->smj lexicon conversion to build bilingual lexicon resources 
- update translations of README-files - Thursday afternoon 
- fix bugs!
Tomi
- make PLX conversion test sample; add conversion testing to the make file 
- Hunspell lexicon conversion 
- fix stuorra-oslolaš lower case  o 
- 
sme->smj lexicon conversion to build bilingual lexicon resources 
- test whether we can revert Makefile changes, and if positive, revert them 
- fix bugs!
Trond
- update the  smj proper noun lexicon, and refine the morphological - fix stuorra-oslolaš lower case  o 
- add  sma texts to the corpus repository 
- 
sme->smj lexicon conversion to build bilingual lexicon resources 
- update translations of README-files - Thursday afternoon 
- fix bugs!.

