Meeting_2007-10-15
Contents:
- Meeting setup
- Agenda
- 1. Opening, agenda review, participants
- 2. Updated task status since last meeting
- 3. Documentation
- 4. Corpus gathering
- 5. Corpus infrastructure
- 6. Infrastructure
- 7. Linguistics
- 8. Name lexicon infrastructure
- 9. Proofing tools
- 10. Other
- 11. Next meeting, closing
- Appendix - task lists for the next week
Meeting setup
- Date: 15.10.2007
- Time: 09.30 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat/Skype
Agenda
- Opening, agenda review
- Reviewing the task list from last week
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Infrastructure
- Linguistics
- name lexicon infrastructure
- Spellers
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 09: 47.
Present: Børre, Per-Eric,Sjur, Thomas, Tomi
Absent: Risten, Trond
Agenda accepted as is.
2. Updated task status since last meeting
Børre
- move Steinar's error markup in the xml files to (a copy of) the original
- not done
- not done
- Hunspell lexicon conversion
- made a test setup, to see which words and wordforms are recognized
- nouns, adjs and verbs are converted okay'ishly
- made a test setup, to see which words and wordforms are recognized
- update Bugzilla to 3.0.x
- done
- done
- begin adding support for the sami languages in OpenOffice.org
- not done
- not done
- add texts from Torkel Rasmussen
- done
- done
-
fix bugs!
- assigned a corpus bug to Saara
Ilona
- lexicalise missing words
- add smj proper nouns
- other smj tasks
Maaren
- lexicalise actio compounds
Per-Eric
- expand the smj typos list
- worked and still working
- worked and still working
- add missing smj words
- worked and still working
- worked and still working
- lexicalise words from the Olavi missing list
- worked and still working
- worked and still working
-
fix bugs!
- fixed some
Risten
- write text to go on the CD cover
- set up CD-printing printer
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
Sjur
- document the AppleScript testing tool
- not done
- not done
- document the testing procedures
- done
- done
- work on the XML name editor/risten.no integration
- not done
- not done
- test correct-type markup with latest enhancements
- not done, but discussed markup nesting with Saara
- not done, but discussed markup nesting with Saara
- get command line hyphenator for automated testing of the hyph-lexicons
- offer received, signed, and we are waiting for the delivery - hopefully this
- offer received, signed, and we are waiting for the delivery - hopefully this
- collect list of problematic words for the hyphenator
- started
- started
- make available InDesign hyphenator to Min Áigi/Davvi Girji
- done
- done
- document the InDesign tools
- not done
- not done
- add hyphenation testing
- not done
- not done
-
fix bugs!
- other tasks:
- improved Mac uninstaller for MS Office tools by making it into a standard
- improved Mac uninstaller for MS Office tools by making it into a standard
Thomas
-
sme->smj lexicon conversion to build bilingual lexicon resources
- begun
- add smj proper nouns
- added
-
fix bugs!
- worked
Tomi
- Hunspell lexicon conversion
- not done
- not done
-
sme->smj lexicon conversion to build bilingual lexicon resources
- not done
- not done
- fix oslolaš bug in smj ( Tomi)
- done
- done
-
fix bugs!
- fixed
Trond
-
sme->smj lexicon conversion to build bilingual lexicon resources
- fix bugs!.
3. Documentation
The new Bugzilla is good: )
TODO:
- update Bugzilla (Børre)
- done
4. Corpus gathering
Sjur discussed error markup nesting with Saara, she will add nesting
TODO:
- test correct-type markup with latest enhancements (Sjur)
- add texts from Torkel Rasmussen ( Børre)
- done
5. Corpus infrastructure
Nothing.
6. Infrastructure
Speller testing is now fixed, and won't change.
7. Linguistics
North Sámi
No real issues at the moment.
Lule Sámi
Tomi has fixed the smj oslolaš isssue.
We still have some missing baseforms asides names. These should be checked.
TODO:
- lexicalise words from the Olavi missing list, but check against the pdf
-
sme->smj lexicon conversion to build bilingual lexicon resources, and
- add proper nouns (Thomas, Ilona)
- look at missing baseforms (Thomas)
8. Name lexicon infrastructure
This sub-project needs to get up and running soon. Mainly Sjur's task.
Decisions made in Tromsø can be found in this meeting memo.
TODO:
- set up Tomcat and risten.no on the G5 again (Sjur, Børre)
- fix bugs in lexc2xml; add comments to the log element (Saara)
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
- implement data synchronisation between risten.no and
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
- merge placenames which are errouneously in different entries: e.g. Helsinki,
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
9. Proofing tools
Hunspell
Sami languages are not supported in OpenOffice.org, until that is fixed we will
Børre has a UTF-8 issue with reading LexC files in the java code.
TODO:
- Begin adding support for the sami languages in OpenOffice.org (Børre)
- Hunspell lexicon conversion (Tomi, Børre)
- fix Unicode bug in Hunspell conversion java code (Tomi, Børre)
- add closed POSes by adding a new output format to the present PLX conversion
- add build and compilation instructions (Børre)
- release an internal alpha for testing (Børre)
- add hunspell testing to the make file (Sjur)
- debug and fix remaining conversion issues (Børre, Tomi)
- fix Unicode bug in Hunspell conversion java code (Tomi, Børre)
Testing
Spelling Error Markup
TODO:
- Set up ways of adding meta-information (source info, used in testing or not,
- move Steinar's error markup in the xml files to (a copy of) the original
- add nested error markup to xml conversion (Saara)
- test new and nested error markup (Sjur)
Automated testing
The infrastructure is now fixed for speller testing, hyphenation testing still
TODO:
- document the AppleScript testing tool (Sjur)
- document the testing procedures (Sjur)
- done
- done
- add hyphenation testing
Lexicon conversion to the PLX format
Open issues based on test results: smj - 489, 506, vissa, 536;
TODO:
- fix oslolaš type bug in smj ( Tomi)
- done
InDesign tools
The beta was sent to Min Áigi and Davvi Girji.
We now need to expand testing for InDesign. The first task is to install it on more computers.
TODO:
- make available InDesign hyphenator to Min Áigi/Davvi Girji ( Sjur)
- document the InDesign tools (Sjur)
- add hyphenation testing (Sjur)
- buy InDesign CS3: one Mac upgrade, one Mac full, one Windows (Børre)
Hyphenators
There are some hyphenation errors we need to debug.
We should look into the possibility of generating pattern-based hyphenation for
To check for bad hyphenation: open a document in word, set the correct language,
TODO:
- get command line hyphenator (Sjur)
- waiting for it
- waiting for it
- collect list of problematic words for the hyphenator (Thomas, Per-Eric)
Beta release
TODO:
- add note and download link to front page (Tomi)
- translate beta 2 download note on front page (Thomas, Sjur)
Release version
The CD cover etc. will be worked on by John-Marcus Kuhmunen, and will follow the
TODO:
- write text to go on the CD cover (Risten)
- set up CD-printing printer (Risten)
10. Other
Corpus contracts
Delayed till after final release.
TODO:
- publish corpus contracts and project infra as open-source on NoDaLi-sta
Bug fixing
When fixing bugs, record the version number containing the fix in the Bugzilla
56 open Divvun/Disamb bugs (23 of these 56 are speller-related bugs,
11. Next meeting, closing
The next meeting is 22.10.2007, 09: 30 Norwegian time.
The meeting was closed at 10: 47.
Appendix - task lists for the next week
Boerre
- move Steinar's error markup in the xml files to (a copy of) the original
- begin adding support for the sami languages in OpenOffice.org
- set up Tomcat and risten.no on the G5 again
- fix Unicode bug in Hunspell conversion java code
- add closed POSes to Hunspell speller
- add build and compilation instructions for Hunspell
- release an internal Hunspell alpha for testing
- buy InDesign CS3: one Mac upgrade, one Mac full, one Windows
- fix bugs!
Ilona
- lexicalise missing words
- Will I have new missing lists to do?
- Will I have new missing lists to do?
- add smj proper nouns
- other smj tasks
Maaren
- lexicalise actio compounds
Per-Eric
- lexicalise words from the Olavi missing list
- collect list of problematic words for the hyphenator
- fix bugs!
Risten
- write text to go on the CD cover
- set up CD-printing printer
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
- add nested error markup to xml conversion
Sjur
- document the AppleScript testing tool
- work on the XML name editor/risten.no integration
- set up Tomcat and risten.no on the G5 again
- test new and nested error markup
- get command line hyphenator for automated testing of the hyph-lexicons
- document the InDesign tools
- add hyphenation testing
- add hunspell testing
- translate beta 2 download note on front page
- fix bugs!
Thomas
-
sme->smj lexicon conversion to build bilingual lexicon resources
- add smj proper nouns
- look at missing baseforms, smj
- check for bad hyphenation
- collect list of problematic words for the hyphenator
- translate beta 2 download note on front page
- fix bugs!
Tomi
- Hunspell lexicon conversion
-
sme->smj lexicon conversion to build bilingual lexicon resources
- add note and download link to front page
- fix Unicode bug in Hunspell conversion java code
- add closed POSes to Hunspell speller
- fix bugs!
Trond
-
sme->smj lexicon conversion to build bilingual lexicon resources
- fix bugs!.