Meeting_2007-10-01
Contents:
- Meeting setup
- Agenda
- 1. Opening, agenda review, participants
- 2. Updated task status since last meeting
- 3. Documentation
- 4. Corpus gathering
- 5. Corpus infrastructure
- 6. Infrastructure
- 7. Linguistics
- 8. Name lexicon infrastructure
- 9. Spellers
- 10. Other
- 11. Next meeting, closing
- Appendix - task lists for the next week
Meeting setup
- Date: 1.10.2007
- Time: 09.30 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat/Skype
Agenda
- Opening, agenda review
- Reviewing the task list from last week
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Infrastructure
- Linguistics
- name lexicon infrastructure
- Spellers
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 09: 37.
Present: Børre, Per-Eric, Risten, Sjur, Thomas, Tomi, Trond
Absent: none
Agenda accepted as is.
2. Updated task status since last meeting
Børre
- move Steinar's error markup in the xml files to (a copy of) the original
- not done
- not done
- add semi-automatic updates of fixed and open issues to README files
- not done
- not done
- order lunch mon-fri for the next gathering in Tromsø, invoice to SD
- done
- done
- help Tomi with adding Hunspell data generation/conversion
- nouns and adjectives work halfway
- nouns and adjectives work halfway
- fix bugs!
Ilona
- lexicalise missing words
- done
- done
- make sms propernoun-list
- Change NIILLAS-names to ANAR or DUORTNUS.
Maaren
- lexicalise actio compounds
Per-Eric
- expand the smj typos list
- Worked and still working
- Worked and still working
- add missing smj words
- Worked and still working
- Worked and still working
- lexicalise words from the Olavi missing list
- Worked and still working
- Worked and still working
- finish with the compounding tags to adjectives
- done
Saara
- add new XSL/XML headers for proofing test docs
- not done
- not done
- Set up ways of adding meta-information for proofing correct corpus docs
- still not done
- still not done
- add correct type differentiation to XSL processing - bug 504
- done
Sjur
- document the AppleScript testing tool
- not finished
- not finished
- document the testing procedures
- the procedures are still changing
- the procedures are still changing
- add baseform transducer test
- done
- done
- fix stuorra-oslolaš lower case o - add it to Bugzilla
- added to Bugzilla
- added to Bugzilla
-
ä/æ in smj speller
- done
- done
- work on the XML name editor/risten.no integration
- not done
- not done
- plan the rest of the project period
- roughly done
- roughly done
- book hotel rooms for the next gathering in Tromsø
- done
- done
-
fix bugs!
- done a lot in Tromsø
Thomas
- fix stuorra-oslolaš lower case o
- not up to me any more
- not up to me any more
-
ä/æ in smj speller
- fixed
- fixed
- reserve meeting room for the next gathering in Tromsø
- done
- done
-
fix bugs!
- worked
Tomi
- make PLX conversion test sample; add conversion testing to the make file
- add Hunspell data generation/conversion
- Helped Børre with this one
- Helped Børre with this one
- fix PLX conversion bugs
- fixed
- fixed
- add correct type differentiation to ccat - bug 505
- added
- added
- find a solution for smj clitics
- found one
- found one
-
fix bugs!
- fixed
Trond
- update the smj proper noun lexicon, and refine the morphological
- Discussed during meeting, not done.
- Discussed during meeting, not done.
- fix stuorra-oslolaš lower case o
- Hanging.
- Hanging.
- add sma texts to the corpus repository
- ' Not done
-
ä/æ in smj speller
- This one was fixed? - Yes
- This one was fixed? - Yes
-
fix bugs!.
- Fixed very many bugs indeed during meeting.
3. Documentation
TODO:
- add semi-automatic updates of fixed and open issues to README files
4. Corpus gathering
TODO:
- add sma Bible texts to the corpus repository (Trond)
- add correct type differentiation to XSL processing - bug 504 (Saara)
- done
- done
- add correct type differentiation to ccat - bug 505 (Tomi)
- done
- both needs testing
- both needs testing
- done
- test correct-type markup with latest enhancements (Sjur)
5. Corpus infrastructure
Nothing.
6. Infrastructure
Speller testing is still fluctuating a bit.
7. Linguistics
North Sámi
Čorru > čorut *Oslolaš with hyphen required, is printed now, but shouldn't oslolaš - is done correctly now
TODO:
- lexicalise actio compounds. Example: vuolggasadji vs. vuolginsadji
- fix stuorra-oslolaš lower case o ( Tomi)
- add to Bugzilla (Sjur)
- done
- add to Bugzilla (Sjur)
Lule Sámi
We have the same oslolaš derivation in smj too, but with another derivation.
Tjårro > tjårok *Oslolaš with hyphen required, is printed now, but shouldn't oslolaš - is done correctly now
smj propernoun bug issue:
- convert from common base (which means sme base)
- Words not convertable should be added to separate smj lexicon, and words that
- Words not convertable should be added to separate smj lexicon, and words that
- send to smj morphology
The original todo was to correct the smj morphology.
- conversion errors
- words that should not have been converten
- missing smj-unique names
- errors in the morphology
Testing procedures:
- analyse baseforms (as for sme)
- generate a couple of caseforms from the baseforms, and inspect result
Suggestion:
TODO:
- refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
-
ä/æ in speller, see bug report #411 (Tomi, Sjur)
- done
- done
- lexicalise words from the Olavi missing list, but check against the pdf
- add oslolaš type derivation test cases to the regresssion file
-
sme->smj lexicon conversion to build bilingual lexicon resources, and
8. Name lexicon infrastructure
This sub-project needs to get up and running soon. Mainly Sjur's task.
Decisions made in Tromsø can be found in this meeting memo.
TODO:
- fix bugs in lexc2xml; add comments to the log element (Saara)
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
- implement data synchronisation between risten.no and
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
- merge placenames which are errouneously in different entries: e.g. Helsinki,
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
9. Spellers
OOo spellers
We have a first working demo!
TODO:
- Hunspell lexicon conversion (Tomi, Børre)
- working on it
Testing
Spelling Error Markup
TODO:
- Set up ways of adding meta-information (source info, used in testing or not,
- move Steinar's error markup in the xml files to (a copy of) the original
Automated testing
The infrastructure still fluctuating a bit.
TODO:
- document the AppleScript testing tool (Sjur)
- document the testing procedures (Sjur)
- started
- started
- add baseform transducer test (Sjur)
- done
Lexicon conversion to the PLX format
All bugs fixed.
The latest changes to the Makefile should be reviewed after the lexicon bug we
TODO:
- fix PLX-related bugs (Tomi)
- done, except for oslolaš
- done, except for oslolaš
- find a solution for smj clitics (Tomi)
- done
- done
- test whether we can revert Makefile changes, and if positive, revert them (Tomi)
New public beta
We are ready to deliver one now. What needs to be done:
TODO:
- collect/build an e-mail notify list; we make it simple, a text document with
- update list of fixed and known issues - Tuesday afternoon (Sjur, Risten)
- update translations of README-files - Thursday afternoon
- update installation packages (Sjur)
- announce the beta (Sjur)
10. Other
Corpus contracts
Delayed till after final release.
TODO:
- publish corpus contracts and project infra as open-source on NoDaLi-sta
Bug fixing
When fixing bugs, record the version number containing the fix in the Bugzilla
62 open Divvun/Disamb bugs (32 of these 56 are speller-related bugs,
11. Next meeting, closing
The next meeting is 8.10.2007, 09: 30 Norwegian time.
The meeting was closed at 10: 26.
Appendix - task lists for the next week
Boerre
- move Steinar's error markup in the xml files to (a copy of) the original
- Hunspell lexicon conversion
- collect/build an e-mail notify list
- fix bugs!
Ilona
- lexicalise missing words
- make sms propernoun-list
- Change NIILLAS-names to ANAR or DUORTNUS.
Maaren
- lexicalise actio compounds
Per-Eric
- expand the smj typos list
- add missing smj words
- lexicalise words from the Olavi missing list
- fix bugs!
Risten
- fixed and open issues to README files
- update translations of README-files - Thursday afternoon
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
Sjur
- document the AppleScript testing tool
- document the testing procedures
- work on the XML name editor/risten.no integration
- fixed and open issues to README files
- test correct-type markup with latest enhancements
- collect/build an e-mail notify list
- update translations of README-files - Thursday afternoon
- update installation packages
- announce the beta
- fix bugs!
Thomas
- explain compound-tags to Tomi
- add oslolaš type derivation test cases to smj regresssion file
-
sme->smj lexicon conversion to build bilingual lexicon resources
- update translations of README-files - Thursday afternoon
- fix bugs!
Tomi
- make PLX conversion test sample; add conversion testing to the make file
- Hunspell lexicon conversion
- fix stuorra-oslolaš lower case o
-
sme->smj lexicon conversion to build bilingual lexicon resources
- test whether we can revert Makefile changes, and if positive, revert them
- fix bugs!
Trond
- update the smj proper noun lexicon, and refine the morphological
- fix stuorra-oslolaš lower case o
- add sma texts to the corpus repository
-
sme->smj lexicon conversion to build bilingual lexicon resources
- update translations of README-files - Thursday afternoon
- fix bugs!.