Meeting_2007-10-29
Contents:
- Meeting setup
- Agenda
- 1. Opening, agenda review, participants
- 2. Updated task status since last meeting
- 3. Documentation
- 4. Corpus infrastructure
- 5. Infrastructure
- 6. Linguistics
- 7. Name lexicon infrastructure
- 8. Proofing tools
- 9. Other
- 10. Next meeting, closing
- Appendix - task lists for the next week
Meeting setup
- Date: 29.10.2007
- Time: 09.30 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat/Skype
Agenda
- Opening, agenda review
- Reviewing the task list from last week
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Infrastructure
- Linguistics
- name lexicon infrastructure
- Spellers
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 10: 15.
Present: Børre, Per-Eric, Risten, Sjur, Thomas, Tomi, Trond
Absent: Ilona
Agenda accepted as is.
2. Updated task status since last meeting
Børre
- move Steinar's error markup in the xml files to (a copy of) the original
- begin adding support for the sami languages in OpenOffice.org
- Began making locales, Sjur added a request for spelling
- Began making locales, Sjur added a request for spelling
- fix Unicode bug in Hunspell conversion java code
- not done
- not done
- test closed POSes in Hunspell speller
- did some testing
- did some testing
- buy InDesign CS3: one Mac upgrade, one Mac full, one Windows
- done
- done
- fix bug 550
- not done
- not done
- fix bugs!
Ilona
- lexicalise smj missing words
- add smj proper nouns
- other smj tasks
Maaren
- lexicalise actio compounds
Per-Eric
- lexicalise words from the Olavi missing list
- Worked and still working
- Worked and still working
-
fix bugs!
- fixed some
Risten
- finish the design/text for the CD and the cover
- set up CD-printing printer
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
- add nested error markup to xml conversion
Sjur
- document the AppleScript speller test output
- done
- done
- work on the XML name editor/risten.no integration
- nothing
- nothing
- set up risten.no on the G5 again
- nope
- nope
- test new and nested error markup
- waiting for Saara
- waiting for Saara
- get command line hyphenator for automated testing of the hyph-lexicons
- still not received
- still not received
- add hyphenation testing
- waiting for command line hyphenator
- waiting for command line hyphenator
- add hunspell testing
- looked at it, installed the first alpha in OpenOffice.org
- looked at it, installed the first alpha in OpenOffice.org
- fix bug 550
- I have some ideas, but nothing done yet
- I have some ideas, but nothing done yet
-
fix bugs!
- reported new ones
- reported new ones
- other:
- requested support for
- tested installation from CD, on both Windows and Mac
- requested support for
Thomas
-
sme->smj lexicon conversion to build bilingual lexicon resources
- not anything this week
- not anything this week
- add smj proper nouns
- some added
- some added
- check for bad hyphenation
- worked
- worked
- look at test cases still not behaving properly
- worked
- worked
-
fix bugs!
- worked
Tomi
- Hunspell lexicon conversion
- did some
- did some
-
sme->smj lexicon conversion to build bilingual lexicon resources
- not done
- not done
- fix Unicode bug in Hunspell conversion java code
- not done
- not done
- test closed POSes in Hunspell speller
- tested
- tested
-
fix bugs!
- fixed
Trond
-
sme->smj lexicon conversion to build bilingual lexicon resources
- Worked hard on this issue. Continued the work based on joint work with
- Worked hard on this issue. Continued the work based on joint work with
3. Documentation
4. Corpus infrastructure
Nothing.
5. Infrastructure
Nothing except bug 550 (see above).
6. Linguistics
North Sámi
No real issues at the moment.
Lule Sámi
Thomas has looked at the missing baseforms.
Trond: we are generating smj words from sme as part of the Univ.
TODO:
- lexicalise words from the Olavi missing list, but check against the pdf
-
sme->smj lexicon conversion to build bilingual lexicon resources, and
- add proper nouns (Thomas, Ilona)
- look at missing baseforms (Thomas)
7. Name lexicon infrastructure
This sub-project needs to get up and running soon. Mainly Sjur's task.
Decisions made in Tromsø can be found in this meeting memo.
TODO:
- set up Tomcat and risten.no on the G5 again (Sjur, Børre)
- install risten.no
- install risten.no
- fix bugs in lexc2xml; add comments to the log element (Saara)
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
- implement data synchronisation between risten.no and
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
- merge placenames which are errouneously in different entries: e.g. Helsinki,
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
8. Proofing tools
Hunspell
The initial alpha is working, and looks promising.
TODO:
- Begin adding support for the sami languages in OpenOffice.org (Børre)
-
Sjur started the process
-
Sjur started the process
- Hunspell lexicon conversion (Tomi, Børre)
- fix Unicode bug in Hunspell conversion java code (Tomi, Børre)
- it seems to work now on the G5
- it seems to work now on the G5
- test closed POSes (Tomi, Børre)
- done some
- done some
- add hunspell testing to the make file (Sjur)
- not yet done
- not yet done
- debug and fix remaining conversion issues (Børre, Tomi)
- a lot work to do here: )
- fix Unicode bug in Hunspell conversion java code (Tomi, Børre)
Testing
Spelling Error Markup
TODO:
- Set up ways of adding meta-information (source info, used in testing or not,
- move Steinar's error markup in the xml files to (a copy of) the original
- add nested error markup to xml conversion (Saara)
- test new and nested error markup (Sjur)
Automated testing
TODO:
- document the AppleScript speller test output (Sjur)
- done
- done
- add hyphenation testing
- waiting for testing tool
Lexicon conversion to the PLX format
Open issues based on test results:
smj
sme
TODO:
- look at test cases still not behaving properly (Thomas, Tomi)
InDesign tools
TODO:
- add hyphenation testing (Sjur)
- buy InDesign CS3: one Mac upgrade, one Mac full, one Windows (Børre)
- ordered
Hyphenators
We should look into the possibility of generating pattern-based hyphenation for
TODO:
- get command line hyphenator (Sjur)
Release version
The CD cover etc. will be worked on by John-Marcus Kuhmunen, and will follow
Network printer for CD printing is ok.
TODO:
- write text to go on the CD cover (Risten)
- set up CD-printing printer (Risten)
Actual release
December 11-13, one of these days.
9. Other
Corpus contracts
Delayed till after final release.
TODO:
- publish corpus contracts and project infra as open-source on NoDaLi-sta
Faroese
Speller for fao using our infrastructure and the knowledge we have.
Bug fixing
When fixing bugs, record the version number containing the fix in the Bugzilla
69 open Divvun/Disamb bugs (35 of these 56 are speller-related bugs,
SD yearly personell seminar
6.-7. December. Sjur will discuss it with Julia, but our view is that we
Software updates
- SubEthaEdit
- Leopard, 10.5
- Skype 2.6.x
10. Next meeting, closing
The next meeting is 5.11.2007, 09: 30 Norwegian time.
The meeting was closed at 10: 49.
Appendix - task lists for the next week
Boerre
- move Steinar's error markup in the xml files to (a copy of) the original
- adding support for the sami languages in OpenOffice.org
- fix Unicode bug in Hunspell conversion java code
- fix bug 550
- fix bugs!
Ilona
- lexicalise smj missing words
- add smj proper nouns
- other smj tasks
Maaren
- lexicalise actio compounds
Per-Eric
- lexicalise words from the Olavi missing list
- fix bugs!
Risten
- finish the design/text for the CD and the cover
- set up CD-printing printer
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
- add nested error markup to xml conversion
Sjur
- work on the XML name editor/risten.no integration
- set up risten.no on the G5 again
- test new and nested error markup
- get command line hyphenator for automated testing of the hyph-lexicons
- add hyphenation testing
- add hunspell testing
- fix bug 550
- fix bugs!
Thomas
-
sme->smj lexicon conversion to build bilingual lexicon resources
- add smj proper nouns
- check for bad hyphenation
- look at test cases still not behaving properly
- fix bugs!
Tomi
- Hunspell lexicon conversion
-
sme->smj lexicon conversion to build bilingual lexicon resources
- fix Unicode bug in Hunspell conversion java code
- fix bugs!
Trond
-
sme->smj lexicon conversion to build bilingual lexicon resources
- fix bugs!.