Meeting_2006-12-11
Contents:
- Meeting setup
- Agenda
- 1. Opening, agenda review, participants
- 2. Updated task status since last meeting
- 3. Documentation
- 4. Corpus gathering
- 5. Corpus infrastructure
- 6. Infrastructure
- 7. Linguistics
- 8. Name lexicon infrastructure
- 9. Spellers
- 10. Other
- 11. Next meeting, closing
- Appendix - task lists for the next week
Meeting setup
- Date: 11.12.2006
- Time: 09.30 Norw. time
- Place: Where we are
- Tools: SubEthaEdit, iChat
Agenda
- Opening, agenda review
- Reviewing the task list from last week
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Infrastructure
- Linguistics
- name lexicon infrastructure
- Spellers
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 09: 46.
Present: Saara, Sjur, Thomas, Tomi, Trond
Absent: Børre, Maaren
Agenda accepted with additions to Other.
2. Updated task status since last meeting
Børre
- contact authors who have already received the corpus licensing contract
- Not done
- Not done
- continue work on script for automatic testing of the spell checker in Word
- Some done
- Some done
-
sma discussions with SD (with Sjur, Trond)
- Not done
- Not done
- get an Intel Mac for testing Windows spellers; get a WinXP license from SD
- Not done
- Not done
- update all forrest installations, including local patches
- needs to be redone, due to a bug in the forrest tarball distributed on divvun.no
- needs to be redone, due to a bug in the forrest tarball distributed on divvun.no
-
fix bugs!
- Not done
Maaren
- investigate the generated word form list sent to Polderland - use the command
Saara
- finalize server of the Xerox tools.
- done
- done
- help Trond with some shell commands
- re-analyze parallel files
- faced some problems with tca2
- faced some problems with tca2
- consider implementing some new features to the corpus files
- not finished.
- not finished.
- add closed POSes to the paradigm gen, if needed.
- done
- done
- investigate why possessives have disappeared from the paradigm generator
- fixed
- fixed
-
fix bugs!
- fixed some
Sjur
- name lexicon:
- refactor SD-terms editor code
- some more done
- some more done
- implement missing propnouns editing functions
- implement improvements decided upon in Tromsø
- refactor SD-terms editor code
- hire linguist and programmer
- decide how to specify compounding behaviour info in the lexicon
-
sma discussions with SD (with Børre, Trond)
- get an Intel Mac for testing Windows spellers; get a WinXP license from SD
- publish corpus contracts and project infra on NoDaLi-sta
- ask SD/Sig-Britt Persson about some of the South Sámi bible texts
- done, will receive them soon
- done, will receive them soon
-
fix bugs!
- other things:
- SD employee seminar took a lot of time
- also demonstrated the alpha spellers
- SD employee seminar took a lot of time
Thomas
- refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
- not done
- not done
- decide how to specify compounding behaviour info in the lexicon
- not done
- not done
-
fix bugs!
- not anyone in the buglist
Tomi
- add closed POS and clitics to PLX generation
- not done
- not done
- add derivations to the PLX generation
- not done
- not done
- add compound stems to the PLX generation
- only nouns
- only nouns
- make sure the normative generator is used when generating paradigms
- done
- done
- investigate why possessives have disappeared from the paradigm generator
- done
- done
- fix bugs!
Trond
- refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
- Looked at them, meeting still not held.
- Looked at them, meeting still not held.
- get more sma texts
- Awaiting talks, and a memory stick..
- Awaiting talks, and a memory stick..
- decide how to specify compounding behaviour info in the lexicon
- Worked on this one, the issue is still open.
- Worked on this one, the issue is still open.
-
sma discussions with SD (with Børre, Sjur)
- Last week Alta, not done.
- Last week Alta, not done.
3. Documentation
TODO:
- update all forrest installations, including local patches (Børre)
- done in Alta for Divvun, still needs a few changes in the file
- done in Alta for Divvun, still needs a few changes in the file
- either fix installations (Sjur), or create a new tarball (Børre)
4. Corpus gathering
Sjur talked to Pia in Alta about the sma bible texts, and she will
TODO:
- get sma Bible / NT texts (Trond)
-
Sjur discussed with Pia in Alta
-
Sjur discussed with Pia in Alta
- Discussions with the Sámi Parliament about sma ( Børre, Sjur, Trond)
- ask SD/Sig-Britt Persson about some of the South Sámi bible texts (Sjur)
- done
5. Corpus infrastructure
Aligner
Problems with the aligner, Saara has talked to Børre, who will look into
TODO:
- gather more parallel texts (Trond, Børre)
- re-analyze parallel files using the command-line version (Saara)
6. Infrastructure
Xerox tools wrapped as servers
Vislcg can't be included in the server wrapping code, it does not support the
TODO:
- investigate why possessives have disappeared from the paradigm generator
- fixed
- fixed
- make sure the normative generator is used when generating paradigms (Tomi)
- done
- done
- find a way of integrating vislcg as a server, or send a feature request
7. Linguistics
Names and multilinguality
TODO:
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
- merge placenames which are errouneously in different entries: e.g. Helsinki,
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
North Sámi
Nothing this week.
Lule Sámi
TODO:
- refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
- not done yet.
8. Name lexicon infrastructure
Decided in Tromsø:
- add logging facilities to the interface
- add option to download local copies of the lexicon files directly from the db
- batch editing (change all entries in the found set), should later be enhanced
- tag for excluding/including a name from certain applications
- future epxansion: choose what info to display in the single language browser
- display existing language entries when adding a new language to a record
- add editor to change single, existing entries
Details can be found in the meeting memo.
TODO:
- develop the needed XQueries and UI (Sjur, Tomi)
Postponed:
- data synchronisation between risten.no and the cvs repo
- new version of xml2lexc (based on ccat), should handle complex names correct:
9. Spellers
Polderland data generation
TODO:
- decide how to specify compounding behaviour info for the lexicon
- add closed POS and clitics to PLX generation (Tomi)
- add derivations to the PLX generation (Tomi)
- add compound stems to the PLX generation (Tomi)
Aspell
TODO when the major part of the PLX conversion is done:
- add Aspell/Hunspell data generation to the lexc2xspell (Tomi - after the
- study Hunspell, perhaps also Soikko (Børre, Sjur, Tomi)
Testing
TODO:
- get an Intel Mac for testing Windows spellers; get a WinXP license from SD
10. Other
Corpus contracts
TODO:
- publish corpus contracts and project infra on NoDaLi-sta (Sjur)
Bug fixing
56 open Divvun/Disamb bugs, and 23 risten.no bugs
Guess: 1/3 of the bugs are fixed already (?)
Task lists as iCal entries
TODO:
- update Maaren's Forrest installation (Børre)
- done
New Perl modules
The rewritten preprocessor depends on a few new Perl modules. Saara has sent
TODO:
- write Perl module dependency documentation (Saara)
- update setup and installation instructions (Børre)
11. Next meeting, closing
The next meeting is 18.12.2006, 09: 30 Norwegian time.
The meeting was closed at 11: 06.
Appendix - task lists for the next week
Boerre
- contact authors who have already received the corpus licensing contract
- continue work on script for automatic testing of the spell checker in Word
-
sma discussions with SD (with Sjur, Trond)
- get an Intel Mac for testing Windows spellers; get a WinXP license from SD
- recreate our forrest tarball
- update setup and installation instructions for new users/computers
- fix bugs!
Maaren
- investigate the generated word form list sent to Polderland - use the command
Saara
- help Trond with some shell commands
- re-analyze parallel files
- consider implementing some new features to the corpus files
- write some Perl documentation
- vislcg as server, possibly as feature request to the vislcg devs
- fix bugs!
Sjur
- name lexicon:
- refactor SD-terms editor code
- implement missing propnouns editing functions
- implement improvements decided upon in Tromsø
- refactor SD-terms editor code
- hire linguist and programmer
- decide how to specify compounding behaviour info in the lexicon
-
sma discussions with SD (with Børre, Trond)
- get an Intel Mac for testing Windows spellers; get a WinXP license from SD
- publish corpus contracts and project infra on NoDaLi-sta
- fix forrest installations for Maaren, Disamb
- fix bugs!
Thomas
- refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
- decide how to specify compounding behaviour info in the lexicon
- fix bugs!
Tomi
- add closed POS and clitics to PLX generation
- add derivations to the PLX generation
- add compound stems to the PLX generation
- fix bugs!
Trond
- refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
- get more sma texts
- decide how to specify compounding behaviour info in the lexicon
-
sma discussions with SD (with Børre, Sjur)