Meeting_2007-11-12
Contents:
- Meeting setup
- Agenda
- 1. Opening, agenda review, participants
- 2. Updated task status since last meeting
- 3. Documentation
- 4. Corpus infrastructure
- 5. Infrastructure
- 6. Linguistics
- 7. Name lexicon infrastructure
- 8. Proofing tools
- 9. Other
- 10. Next meeting, closing
- Appendix - task lists for the next week
Meeting setup
- Date: 12.11.2007
- Time: 09.30 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat/Skype
Agenda
- Opening, agenda review
- Reviewing the task list from last week
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Infrastructure
- Linguistics
- name lexicon infrastructure
- Spellers
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 09: 44.
Present: Børre, Ilona, Risten, Sjur, Thomas, Tomi, Trond
Absent: Per-Eric
Agenda accepted as is.
2. Updated task status since last meeting
Børre
- move Steinar's error markup in the xml files to (a copy of) the original
- not done
- not done
- follow-up support for the sami languages in OpenOffice.org
- not done
- not done
- fix Unicode bug in Hunspell conversion java code
- haven't found any solution
- haven't found any solution
- fix bug 550
- not done
- not done
- install Leopard on the G5 and the Xserve
- Done
- Done
- move giellatekno.uit.no (including Bugzilla) and www.divvun.no to the G5.
- bugzilla is not available yet, haven't been able to compile the necessary
- bugzilla is not available yet, haven't been able to compile the necessary
- add paradigm testing
- not done
- not done
- fix Windows CD installation bug
- not done
- not done
- fix bugs!
Ilona
- lexicalise smj missing words
- In letter s
- In letter s
- add smj proper nouns
- done, but without me.
- done, but without me.
- other smj tasks, ask Thomas
- Not asked.
Maaren
- lexicalise actio compounds
Per-Eric
- lexicalise words from the Olavi missing list
- fix bugs!
Risten
- finish the design/text for the CD and the cover
- not finish
- not finish
- set up CD-printing printer
- Leif Åge has ordered a new printer
- Leif Åge has ordered a new printer
- try to burn a CD at SD
- done
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
- add nested error markup to xml conversion
- add hunspell testing support
- done
Sjur
- work on the XML name editor/risten.no integration
- nope
- nope
- set up risten.no on the G5 again
- still no work
- still no work
- test new and nested error markup
- not yet
- not yet
- get command line hyphenator for automated testing of the hyph-lexicons
- recieved a first version, it contains the hyphenation bug, but it lets us set
- recieved a first version, it contains the hyphenation bug, but it lets us set
- add hyphenation testing
- not yet
- not yet
- add hunspell testing
- done
- done
- add paradigm testing
- almost done
- almost done
- fix bug 550
- no
- no
- fix hyphenation of derivations, inflections
- done
- done
- follow-up support for the sami languages in OpenOffice.org
- done - it looks promising at the moment, but the OOo deadline for getting
- done - it looks promising at the moment, but the OOo deadline for getting
- fix Windows CD installation bug
- not yet
- not yet
- fix bugs!
Thomas
-
sme->smj lexicon conversion to build bilingual lexicon resources
- not this week
- not this week
- add smj proper nouns
- added
- added
- check for bad hyphenation
- not this week
- not this week
- look at test cases still not behaving properly
- worked a lot
- worked a lot
- fix hyphenation of derivations, inflections
- done
- done
-
sme->smj name conversion
- done
- done
-
fix bugs!
- worked
Tomi
- Hunspell lexicon conversion
- not done
- not done
-
sme->smj lexicon conversion to build bilingual lexicon resources
- not done
- not done
- fix Unicode bug in Hunspell conversion java code
- done
- done
- fix hyphenation of derivations, inflections
- done
- done
-
fix bugs!
- fixing
Trond
-
sme->smj lexicon conversion to build bilingual lexicon resources
- Progress, progress
- Progress, progress
- fix hyphenation of derivations, inflections
- Meeting held, the linguistic side seems ok, but awaiting speller testing
- Meeting held, the linguistic side seems ok, but awaiting speller testing
-
sme->smj name conversion
- Done.
- Done.
- telephone meeting with Sjur and the faroese group re faroese speller
- Still not been able to get hold of the two of them at the same time.
- Still not been able to get hold of the two of them at the same time.
-
fix bugs!.
- Nothing done here.
3. Documentation
4. Corpus infrastructure
Nothing.
5. Infrastructure
http://giellatekno.uit.no and http://www.divvun.no are now hosted on the G5. Bugzilla is
ssh www.divvun.no = G5 ssh giellatekno.uit.no = G5 ssh 129.242.220.111 = xserve
Jabber support = iChat server:
- encrypted chats
- permanent chat rooms
- stored and forwarded messages
- archive of chats, indexed and searchable
Addresses of the format:
sjur@divvun.no trond@giellatekno.uit.no
TODO:
- add Jabber account in iChat (all)
6. Linguistics
North Sámi
We solved(?) the hypheantion problem, but got instead a circularity problem.
TODO:
- fix hyphenation of derivations (Thomas, Tomi, Sjur, Trond)
- done, but not tested
- done, but not tested
- fix circularity issue (Sjur, Tomi, Trond)
Lule Sámi
Name conversion improvements last week.
TODO:
- lexicalise words from the Olavi missing list, but check against the pdf
-
Ilona is working on the last letter in the alphabet, we're almost
-
Ilona is working on the last letter in the alphabet, we're almost
-
sme->smj lexicon conversion to build bilingual lexicon resources, and
- some additions coming in there as well, but not until we have checked every
- some additions coming in there as well, but not until we have checked every
-
sme->smj name conversion (Trond, Thomas)
- done
- done
- add proper nouns (Thomas, Ilona)
- nothing more to be done
- nothing more to be done
- look at missing baseforms (Thomas)
- only two missing baseforms at the moment: )
7. Name lexicon infrastructure
This sub-project needs to get up and running soon. Mainly Sjur's task.
Decisions made in Tromsø can be found in this meeting memo.
TODO:
- set up Tomcat and risten.no on the G5 again (Sjur, Børre)
- install risten.no
- install risten.no
- fix bugs in lexc2xml; add comments to the log element (Saara)
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
- implement data synchronisation between risten.no and
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
- merge placenames which are errouneously in different entries: e.g. Helsinki,
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
8. Proofing tools
Hunspell
TODO:
- Follow-up support for the sami languages in OpenOffice.org (Børre, Sjur)
- there is still hope to get them included in OOo 2.4
- there is still hope to get them included in OOo 2.4
- Hunspell lexicon conversion (Tomi, Børre)
- nothing major happened last week, working on getting the basic conversion
- nothing major happened last week, working on getting the basic conversion
- add hunspell testing to the make file (Sjur)
- done
Testing
Spelling Error Markup
TODO:
- Set up ways of adding meta-information (source info, used in testing or not,
- move Steinar's error markup in the xml files to (a copy of) the original
- add nested error markup to xml conversion (Saara)
- test new and nested error markup (Sjur)
Automated testing
Paradigm testing implemented, but the reporting needs improvements. Testing
cd gt/sme/testing gen-paradigms.sh
Output in paradigm-sme.txt.
Test words should go in the file
WORD<TAB>POS áigi n ráhkistit v
We need to separate proper nouns from regular nouns properly, by checking the
TODO:
- add hyphenation testing (Sjur)
- add hunspell testing (Saara, Sjur)
- implemented
- implemented
- add paradigm testing (Børre, Sjur)
- done, but needs improvements
MS Office
We need to check that the proofing tools are working in all Office applications,
TODO:
- test the proofing tools with all MS Office applications (Børre, Thomas)
Lexicon conversion to the PLX format
Open issues based on test results:
smj
sme
TODO:
- look at test cases still not behaving properly (Thomas, Tomi)
InDesign tools
TODO:
- add hyphenation testing (Sjur)
Hyphenators
We should look into the possibility of generating pattern-based hyphenation for
TODO:
- get command line hyphenator (Sjur)
Release version
We need to print the CD cover pretty soon, and for that we need to finish the CD
TODO:
- try to burn a CD at SD (Risten, Leif-Åge)
- finish text to go on the CD cover (Risten)
- set up CD-printing printer (Risten)
- fix Windows CD installation bug (Sjur, Børre)
- get price and schedule for printed CD cover (Risten)
Actual release
December 11-13, one of these days.
9. Other
Corpus contracts
Delayed till after final release.
TODO:
- publish corpus contracts and project infra as open-source on NoDaLi-sta
Faroese
Speller for fao using our infrastructure and the knowledge we have.
TODO:
- set up a telephone meeting with them and Sjur ( Trond)
Bug fixing
When fixing bugs, record the version number containing the fix in the Bugzilla
69 open Divvun/Disamb bugs (35 of these 56 are speller-related bugs,
Dictionaries
We have now 7 dictionaries in xml format that could be published in a
It requires that eXist/XQuery + homepage, and conversion to a proper
TODO:
- eXist on G5 - done
- risten.no on G5 - homepage done, but needs port tweaking/changing
- risten.no XQuery framework - NOT done
- XInclude conversion script - NOT done
- homepage on the documetation pages - draft done
Parallel corpora
There will be a parallel corpus seminar this weekend. It would have been
- fkvnob (from Kvensk Institutt)
- smenob, smanob, smjnob (from Sámi skuvlahistorjá)
TODO:
- discuss more parallel texts (Børre, Saara, Trond)
SD yearly personell seminar
6.-7. December. Sjur will discuss it with Julia, but our view is that we
Software updates
- SubEthaEdit - Ilona still missing: )
- Leopard, 10.5
- installed:
- Børre
- Sjur
- Risten
- Børre
- not yet:
- Thomas
- Ilona
- Tomi
- Trond
- Per-Eric
- Thomas
- installed:
- Skype 2.6.x
10. Next meeting, closing
The next meeting is 19.11.2007, 09: 30 Norwegian time.
Trond will be away in the next meeting.
The meeting was closed at 11: 13.
Appendix - task lists for the next week
Boerre
- move Steinar's error markup in the xml files to (a copy of) the original
- fix Unicode bug in Hunspell conversion java code
- fix bug 550
- move Bugzilla to the G5.
- fix Windows CD installation bug
- discuss more parallel texts
- fix bugs!
Ilona
- lexicalise smj missing words
- other smj tasks, ask Thomas
Maaren
- lexicalise actio compounds
Per-Eric
- lexicalise words from the Olavi missing list
- fix bugs!
Risten
- finish the design/text for the CD and the cover
- set up CD-printing printer
- try to burn a CD at SD
- get price and schedule for printed CD cover
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
- add nested error markup to xml conversion
- discuss more parallel texts
Sjur
- work on the XML name editor/risten.no integration
- set up risten.no on the G5 again
- test new and nested error markup
- get command line hyphenator for automated testing of the hyph-lexicons
- add hyphenation testing
- improve paradigm testing
- fix bug 550
- follow-up support for the sami languages in OpenOffice.org
- fix Windows CD installation bug
- fix circularity issue in nonrec transducers
- fix bugs!
Thomas
-
sme->smj lexicon conversion to build bilingual lexicon resources
- check for bad hyphenation
- look at test cases still not behaving properly
- paradigm testing
- fix bugs!
Tomi
- Hunspell lexicon conversion
- fix circularity issue in nonrec transducers
- fix bugs!
Trond
-
sme->smj lexicon conversion to build bilingual lexicon resources
- fix hyphenation of derivations, inflections
- telephone meeting with Sjur and the faroese group re faroese speller
- fix circularity issue in nonrec transducers
- discuss more parallel texts
- fix bugs!.