Meeting_2007-11-26
Contents:
- Meeting setup
- Agenda
- 1. Opening, agenda review, participants
- 2. Updated task status since last meeting
- 3. Documentation
- 4. Corpus infrastructure
- 5. Infrastructure
- 6. Linguistics
- 7. Name lexicon infrastructure
- 8. Proofing tools
- 9. Other
- 10. Next meeting, closing
- Appendix - task lists for the next week
Meeting setup
- Date: 26.11.2007
- Time: 09.30 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat/Skype
Agenda
- Opening, agenda review
- Reviewing the task list from last week
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Infrastructure
- Linguistics
- name lexicon infrastructure
- Spellers
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 10: 15.
Present: Børre, Ilona, Per-Eric, Sjur, Thomas, Tomi
Absent: Risten, Trond
Agenda accepted as is.
2. Updated task status since last meeting
Børre
- move Steinar's error markup in the xml files to (a copy of) the original
- not done
- not done
- fix bug 550
- not done
- not done
- fix Windows CD installation bug
- not done
- not done
- discuss more parallel texts
- not done
- not done
- finalise InDesign hyphenator
- not done
- not done
- update usage and installation documentation
- not done
- not done
-
fix bugs!
- not done
- not done
- Other:
- Continued work on hunspell
- Updated OS X on xserve, added RAM.
- Made new logos
- Continued work on hunspell
Ilona
- other smj tasks, ask Thomas
- Buy the DVD for Leopard
- Bought, and downloaded Leopard disk image to the computer. Have to burn it to
- Bought, and downloaded Leopard disk image to the computer. Have to burn it to
- other tasks:
- Done some testing in sme speller and reported it to Thomas
Maaren
- lexicalise actio compounds
Per-Eric
- lexicalise words from the Olavi missing list
- Done
- Done
- derivations tests
- Done some
- Done some
-
fix bugs!
- Not done anything this week
Risten
- set up CD-printing printer
- on its way - ordered
- on its way - ordered
- get price and schedule for printed CD cover
- done
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
- add nested error markup to xml conversion
- discuss more parallel texts
Sjur
- work on the XML name editor/risten.no integration
- nothing
- nothing
- set up risten.no on the G5 again
- made it, although Trond reports problems
- made it, although Trond reports problems
- test new and nested error markup
- later
- later
- improve hyphenation testing
- nothing improved
- nothing improved
- fix bug 550
- not done
- not done
- fix Windows CD installation bug
- tried, but it's not working. Work-around should be documented
- tried, but it's not working. Work-around should be documented
- finalise InDesign hyphenator
- nothing done yet
- nothing done yet
- update usage and installation documentation
- not yet
- not yet
-
fix bugs!
- other:
- several improvements to the CD creation process
- worked on the CD cover (proofreading many times)
- faroese speller telephone meeting
- several improvements to the CD creation process
Thomas
-
sme->smj lexicon conversion to build bilingual lexicon resources
- not anything this week
- not anything this week
- check for bad hyphenation
- worked a little
- worked a little
- look at test cases still not behaving properly
- worked a little
- worked a little
- paradigm testing
- done some
- done some
- test the proofing tools with all MS Office applications
- done
- done
- finalise InDesign hyphenator
- not done
- not done
- update usage and installation documentation
- not done
- not done
-
fix bugs!
- worked
Tomi
- Hunspell lexicon conversion
- not done
- not done
-
fix bugs!
- fixed bugs
Trond
-
sme->smj lexicon conversion to build bilingual lexicon resources
- Worked on this, refined, work is on schedule
- Worked on this, refined, work is on schedule
- telephone meeting with Sjur and the faroese group re faroese speller
- Done
- Done
- discuss more parallel texts
- Worked on this.
- Worked on this.
- fix bugs!.
3. Documentation
4. Corpus infrastructure
Nothing.
5. Infrastructure
TODO:
- add Jabber account in iChat (all)
6. Linguistics
North Sámi
Hyphenation is better, but still contains a lot of errors. Sjur will run the
TODO:
- test latest hyphenator (Sjur)
- analyse test results (Thomas, Sjur, Trond)
Lule Sámi
Trond and his team have found words to be added to the smj lexicon.
cat smesmj.txt | grep -v 'prop$' | cut -f2 | lookup -flags mbTT -utf8 ~/gt/smj/bin/smj.fst | grep '\?' | l
6581 words in the smesmj.txt lexicon. Disregarding the proper nouns, 1824 are
čála tjála n giehtačála giehtatjála n vuolláičála vuollájtjála n <= :-) vinjučála vinjotjála n johtučála jåhtotjála n čuokkisčála tjuokkestjála n bajildusčála bajeldustjála n mála mála n tjála čála n giehtatjála giehtačála n <=== :-(( vuollájtjála vuolláičála n tjuokkestjála čuokkisčála n vinjotjála vinjučála n jåhtotjála johtučála n leapma liebma n ja/dahje ja/dahje +? gobba gobba +? gaiba gaiba +? struhcca struhcca +? fáhcca fáhcca +? suorbmafáhcca suorbmafáhcca +? vahca vahca +? ohca ohca +? juhca juhca +?
It seems to be a mixup of smj and sme in the material. That has to be cleaned
We have to test hyphenation for lulesami as well.
TODO:
- lexicalise words from the Olavi missing list, but check against the pdf
- done
- done
-
sme->smj lexicon conversion to build bilingual lexicon resources, and
- test hyphenation (Sjur, Thomas)
7. Name lexicon infrastructure
Sjur got risten.no up and running on the G5. Worked only for him, though.
Decisions made in Tromsø can be found in this meeting memo.
TODO:
- set up Tomcat and risten.no on the G5 again (Sjur, Børre)
- install risten.no
- did it
- did it
- install risten.no
- fix bugs in lexc2xml; add comments to the log element (Saara)
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
- implement data synchronisation between risten.no and
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
- merge placenames which are errouneously in different entries: e.g. Helsinki,
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
8. Proofing tools
Hunspell
Continuously improving.
TODO:
- Hunspell lexicon conversion (Tomi, Børre)
Testing
Spelling Error Markup
This will wait till after the release.
TODO:
- Set up ways of adding meta-information (source info, used in testing or not,
- move Steinar's error markup in the xml files to (a copy of) the original
- add nested error markup to xml conversion (Saara)
- test new and nested error markup (Sjur)
Automated testing
TODO:
- improve hyphenation testing (Sjur)
- not done yet
MS Office
An important aspect of this testing is to document in the user guide anything
TODO:
- test the proofing tools with all MS Office applications (Børre, Thomas)
- Thomas has tested all Windows apps - they all work fine with our tools
Lexicon conversion to the PLX format
Open issues based on test results:
smj
sme
TODO:
- look at test cases still not behaving properly (Thomas, Tomi)
- check that the smj R lexicon is identical to sme ( Thomas)
InDesign tools
TODO:
- improve hyphenation testing (Sjur)
Hyphenators
Testing!!!
Release version
Schedule and tasks for the remaining weeks:
TODO:
- set up CD-printing printer (Risten, Leif Åge)
- fix Windows CD installation bug (Sjur, Børre)
- put on hold - work-around should be documented
- put on hold - work-around should be documented
- get price and schedule for printed CD cover (Risten)
- done: 3980,- + 900,- + VAT for 1000 covers.
- 8 days production time
- 100 covers will be picked up in Tromsø (Børre)
- Print 50 CDs, take them to Oslo (Risten, Julie)
- Burn the CDs in Oslo (Sjur)
- done: 3980,- + 900,- + VAT for 1000 covers.
- fix remaining bugs - golden master by end of this Monday (all)
- finalise InDesign hyphenator (Sjur, Børre, Thomas)
- testing = hyphenation testing
- documentation
- installation
- testing = hyphenation testing
- update usage and installation documentation (Børre, Thomas, Sjur)
- translate all new documentation (all)
- QA all documentation (all)
- do as much hunsopell as possible (Børre, Tomi)
Actual release
December 12 is the most likely date, before 12: 00. Still to be confirmed.
There will be a release party in the afternoon.
9. Other
Corpus contracts
Delayed till after final release.
TODO:
- publish corpus contracts and project infra as open-source on NoDaLi-sta
Bug fixing
When fixing bugs, record the version number containing the fix in the Bugzilla
83 open Divvun/Disamb bugs (45 of these 83 are speller-related bugs,
Software updates
- Leopard, 10.5
- Ilona - Will have it tomorrow or at latest on Wednesday. Probably needs help
- Per-Eric - ready for updating tomorrow - Børre will help
- Ilona - Will have it tomorrow or at latest on Wednesday. Probably needs help
10. Next meeting, closing
The next meeting is 03.12.2007, 09: 30 Norwegian time.
The meeting was closed at 11: 42.
Appendix - task lists for the next week
Boerre
- move Steinar's error markup in the xml files to (a copy of) the original
- fix bug 550
- finalise InDesign hyphenator
- update usage and installation documentation
- fix bugs!
Ilona
- lexicalise smj missing words.
- Help Trond with the smj dictionary.
- Install Leopard
Maaren
- lexicalise actio compounds
Per-Eric
- check some unusual words from the Olavi missing list which are still not
- derivations tests
- Install Leopard
- fix bugs!
Risten
- set up CD-printing printer
- test printer
- finish the cd cover and cd design
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
- add nested error markup to xml conversion
- discuss more parallel texts
Sjur
- fix bug 550
- document Windows CD installation work-around
- finalise InDesign hyphenator
- update usage and installation documentation
- test latest hyphenator
- analyse hyphenation test results
- fix bugs!
Thomas
-
sme->smj lexicon conversion to build bilingual lexicon resources
- test hyphenation
- analyse hyphenation test results
- look at test cases still not behaving properly
- paradigm testing
- finalise InDesign hyphenator
- update usage and installation documentation
- check that the smj R lexicon is identical to sme
- fix bugs!
Tomi
- Hunspell lexicon conversion
- fix bugs!
Trond
-
sme->smj lexicon conversion to build bilingual lexicon resources
- test hyphenation
- analyse hyphenation test results
- fix bugs!.