Meeting_2007-11-19
Contents:
- Meeting setup
- Agenda
- 1. Opening, agenda review, participants
- 2. Updated task status since last meeting
- 3. Documentation
- 4. Corpus infrastructure
- 5. Infrastructure
- 6. Linguistics
- 7. Name lexicon infrastructure
- 8. Proofing tools
- 9. Other
- 10. Next meeting, closing
- Appendix - task lists for the next week
Meeting setup
- Date: 19.11.2007
- Time: 09.30 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat/Skype
Agenda
- Opening, agenda review
- Reviewing the task list from last week
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Infrastructure
- Linguistics
- name lexicon infrastructure
- Spellers
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 10: 15.
Present: Børre, Ilona, Per-Eric, Sjur, Thomas, Tomi
Absent: Risten, Trond
Agenda accepted as is.
2. Updated task status since last meeting
Børre
- move Steinar's error markup in the xml files to (a copy of) the original
- not done
- not done
- fix Unicode bug in Hunspell conversion java code
- don't know the reason for this one
- don't know the reason for this one
- fix bug 550
- not done
- not done
- move Bugzilla to the G5.
- done
- done
- fix Windows CD installation bug
- not done
- not done
- discuss more parallel texts
- nothing con
- nothing con
- fix bugs!
Ilona
- lexicalise smj missing words
- Done.. at least most of it.
- Done.. at least most of it.
- other smj tasks, ask Thomas
Maaren
- lexicalise actio compounds
Per-Eric
- lexicalise words from the Olavi missing list
- Worked and still working. It will be ready this week, some strange words left. We have to make a new missng list to see which words are left.
- Worked and still working. It will be ready this week, some strange words left. We have to make a new missng list to see which words are left.
-
fix bugs!
- Nothing to fix
Risten
- finish the design/text for the CD and the cover
- done
- done
- set up CD-printing printer
- try to burn a CD at SD
- done
- done
- get price and schedule for printed CD cover
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
- add nested error markup to xml conversion
- almost finished, just some testing left
- almost finished, just some testing left
- discuss more parallel texts
Sjur
- work on the XML name editor/risten.no integration
- nothing - this will have to wait till Divvun2
- nothing - this will have to wait till Divvun2
- set up risten.no on the G5 again
- really tried last week, to help Trond with some work, but failed (eXist
- really tried last week, to help Trond with some work, but failed (eXist
- test new and nested error markup
-
Saara tried getting it in place, but is not finished with her work, thus
-
Saara tried getting it in place, but is not finished with her work, thus
- get command line hyphenator for automated testing of the hyph-lexicons
- done
- done
- add hyphenation testing
- first rough version done
- first rough version done
- improve paradigm testing
- done
- done
- fix bug 550
- not yet
- not yet
- follow-up support for the sami languages in OpenOffice.org
- done, they will be included in the next OOo release - 2.4
- done, they will be included in the next OOo release - 2.4
- fix Windows CD installation bug
- not yet done
- not yet done
- fix circularity issue in nonrec transducers
-
Tomi did this on his own - great!
-
Tomi did this on his own - great!
- fix bugs!
Thomas
-
sme->smj lexicon conversion to build bilingual lexicon resources
- not this week
- not this week
- check for bad hyphenation
- not this week
- not this week
- look at test cases still not behaving properly
- worked
- worked
- paradigm testing
- worked a lot
- worked a lot
-
fix bugs!
- worked
Tomi
- Hunspell lexicon conversion
- not done
- not done
- fix circularity issue in nonrec transducers
- fixed
- fixed
-
fix bugs!
- other
- installed Leopard
Trond
-
sme->smj lexicon conversion to build bilingual lexicon resources
- fix hyphenation of derivations, inflections
- telephone meeting with Sjur and the faroese group re faroese speller
- fix circularity issue in nonrec transducers
- discuss more parallel texts
- fix bugs!.
3. Documentation
4. Corpus infrastructure
Nothing.
5. Infrastructure
Bugzilla is up and running again.
TODO:
- add Jabber account in iChat (all)
6. Linguistics
North Sámi
TODO:
- fix hyphenation of derivations (Thomas, Tomi, Sjur, Trond)
- now tested, and much improved, but still needs improvements and further
- now tested, and much improved, but still needs improvements and further
- fix circularity issue (Sjur, Tomi, Trond)
- Tomi fixed it
Lule Sámi
TODO:
- lexicalise words from the Olavi missing list, but check against the pdf
- almost finished - only parts of the letter r still missing
- almost finished - only parts of the letter r still missing
-
sme->smj lexicon conversion to build bilingual lexicon resources, and
- look at missing baseforms (Thomas)
- done
7. Name lexicon infrastructure
This sub-project needs to get up and running soon. Mainly Sjur's task.
Decisions made in Tromsø can be found in this meeting memo.
TODO:
- set up Tomcat and risten.no on the G5 again (Sjur, Børre)
- install risten.no
- really tried, but got problems
- really tried, but got problems
- install risten.no
- fix bugs in lexc2xml; add comments to the log element (Saara)
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
- implement data synchronisation between risten.no and
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
- merge placenames which are errouneously in different entries: e.g. Helsinki,
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
8. Proofing tools
Hunspell
TODO:
- Follow-up support for the sami languages in OpenOffice.org (Børre, Sjur)
- done, scheduled for version 2.4
- done, scheduled for version 2.4
- Hunspell lexicon conversion (Tomi, Børre)
- improved, nouns compile much improved, adjectives also converts, but with
- improved, nouns compile much improved, adjectives also converts, but with
Testing
Spelling Error Markup
This will wait till after the release.
TODO:
- Set up ways of adding meta-information (source info, used in testing or not,
- move Steinar's error markup in the xml files to (a copy of) the original
- add nested error markup to xml conversion (Saara)
- test new and nested error markup (Sjur)
Automated testing
TODO:
- add hyphenation testing (Sjur)
- rough version added
- rough version added
- improve paradigm testing report (Sjur)
- done
MS Office
An important aspect of this testing is to document in the user guide anything
TODO:
- test the proofing tools with all MS Office applications (Børre, Thomas)
Lexicon conversion to the PLX format
PLX conversion update: we will soon get an updated speller engine with flags for
Open issues based on test results:
smj
sme
Guovdageaidnu-láđđi nom- Guovdageainnu-láđđi gen-
It should be Guovdageaidnu-láđđi OR Guovdageainnu láđđi. The first one is
Harstad-biila (nom) is ok, whereas gen. Harstada-biila is not, ie the
ovda- ovda- ovda+Cmpnd
TODO:
- look at test cases still not behaving properly (Thomas, Tomi)
InDesign tools
TODO:
- add hyphenation testing (Sjur)
- done but not finished
Hyphenators
We should look into the possibility of generating pattern-based hyphenation for
TODO:
- get command line hyphenator (Sjur)
- done
Release version
Schedule and tasks for the remaining weeks:
TODO:
- try to burn a CD at SD (Risten, Leif-Åge)
- done, it is working exactly as one burned on the Mac
- done, it is working exactly as one burned on the Mac
- finish text to go on the CD cover (Risten)
- done
- done
- set up CD-printing printer (Risten)
- in the works
- in the works
- fix Windows CD installation bug (Sjur, Børre)
- not yet done
- not yet done
- get price and schedule for printed CD cover (Risten)
- not received
- not received
- fix remaining bugs - golden master by next Monday (all)
- finalise InDesign hyphenator (Sjur, Børre, Thomas)
- testing
- documentation
- installation
- testing
- update usage and installation documentation (Børre, Thomas, Sjur)
- translate all new documentation (all)
- QA all documentation (all)
- do as much hunsopell as possible (Børre, Tomi)
Actual release
December 11-13, one of these days.
Hotel rooms received for all except Ilona, will be received for her as well.
There will be a release party in the afternoon.
9. Other
Corpus contracts
Delayed till after final release.
TODO:
- publish corpus contracts and project infra as open-source on NoDaLi-sta
Faroese
Speller for fao using our infrastructure and the knowledge we have.
TODO:
- set up a telephone meeting with them and Sjur ( Trond)
Bug fixing
When fixing bugs, record the version number containing the fix in the Bugzilla
69 open Divvun/Disamb bugs (35 of these 56 are speller-related bugs,
Dictionaries
TODO:
- eXist on G5 - done
- risten.no on G5 - homepage done, but needs port tweaking/changing
- risten.no XQuery framework - done
- XInclude conversion script - done
- homepage on the documetation pages - draft done
Parallel corpora
TODO:
- discuss more parallel texts (Børre, Saara, Trond)
SD yearly personell seminar
6.-7. December. Sjur has discussed it with Julia, and we won't go there.
Software updates
- Leopard, 10.5
- Ilona - Doesn't have a proper DVD yet.
- Trond
- Per-Eric
- Ilona - Doesn't have a proper DVD yet.
10. Next meeting, closing
The next meeting is 26.11.2007, 09: 30 Norwegian time.
The meeting was closed at 11: 34.
Appendix - task lists for the next week
Boerre
- move Steinar's error markup in the xml files to (a copy of) the original
- fix bug 550
- fix Windows CD installation bug
- discuss more parallel texts
- finalise InDesign hyphenator
- update usage and installation documentation
- fix bugs!
Ilona
- lexicalise smj missing words
- other smj tasks, ask Thomas
- Buy the DVD for Leopard
Maaren
- lexicalise actio compounds
Per-Eric
- lexicalise words from the Olavi missing list
- derivations tests
- fix bugs!
Risten
- set up CD-printing printer
- get price and schedule for printed CD cover
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
- add nested error markup to xml conversion
- discuss more parallel texts
Sjur
- work on the XML name editor/risten.no integration
- set up risten.no on the G5 again
- test new and nested error markup
- improve hyphenation testing
- fix bug 550
- fix Windows CD installation bug
- finalise InDesign hyphenator
- update usage and installation documentation
- fix bugs!
Thomas
-
sme->smj lexicon conversion to build bilingual lexicon resources
- check for bad hyphenation
- look at test cases still not behaving properly
- paradigm testing
- test the proofing tools with all MS Office applications
- finalise InDesign hyphenator
- update usage and installation documentation
- fix bugs!
Tomi
- Hunspell lexicon conversion
- fix bugs!
Trond
-
sme->smj lexicon conversion to build bilingual lexicon resources
- telephone meeting with Sjur and the faroese group re faroese speller
- discuss more parallel texts
- fix bugs!.