Meeting_2007-02-05
Contents:
- Meeting setup
- Agenda
- 1. Opening, agenda review, participants
- 2. Updated task status since last meeting
- 3. Documentation
- 4. Corpus gathering
- 5. Corpus infrastructure
- 6. Infrastructure
- 7. Linguistics
- 8. Name lexicon infrastructure
- 9. Spellers
- 10. Other
- 11. Next meeting, closing
- Appendix - task lists for the next week
Meeting setup
- Date: 05.02.2007
- Time: 09.00 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat
Agenda
- Opening, agenda review
- Reviewing the task list from last week
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Infrastructure
- Linguistics
- name lexicon infrastructure
- Spellers
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 10: 12. Part two of the meeting opened at 14: 45.
Present: Børre, Maaren, Sjur, Steinar, Thomas
Absent: Saara, Tomi, Trond
Agenda accepted as is.
2. Updated task status since last meeting
Børre
- send smj translations to Polderland
- Both smj and sme are sent.
- Both smj and sme are sent.
- write form to request corpus user account
- Not done
- Not done
- document how to apply for access to closed corpus, and details on the corpus
- Not done
- Not done
- add short description on our front page on anonymous cvs and corpus access,
- Done
- Done
- update and fix our documentation and infrastructure as Steinar finds
- Not done
- Not done
- continue work on script for automatic testing of the spell checker in Word
- Not done
- Not done
- fix sme texts in corpus this month
- Not done
- Not done
- find missing nob parallel texts in corpus
- Not done
- Not done
- translate Windows installer text to sme
- Thomas helped out
- Thomas helped out
- work on the Polderland data generation (PLX format conversion)
- Concentrate on compounding
- Not done
- Not done
- Concentrate on compounding
- go through other directories, fix parallellity information for other documents
- Not done
- Not done
- add sma texts to the corpus repository
- Not done
- Not done
- order Intel Macs
- Received them last week. Both machines are in use. Will install Windows
- Received them last week. Both machines are in use. Will install Windows
- fix bugs!
Maaren
- tasks according to Thomas
- Not done
Saara
- fix sme texts in corpus this month
- continue aligning the rest of the parallel files
- fix problems with xml2lexc if needed
- fix bugs!
Sjur
- name lexicon:
- restructure interface code for easier maintenance, coding and use
- done and working in all checked browsers except Opera
- done and working in all checked browsers except Opera
- refactor the rest of the SD-terms editor code
- major rewrite and simplification ahead, based on the new interface
- major rewrite and simplification ahead, based on the new interface
- implement missing propnouns editing functions
- started looking at these again
- started looking at these again
- implement improvements decided upon in Tromsø
- restructure interface code for easier maintenance, coding and use
- hire linguist and programmer
- nothing done
- nothing done
- publish corpus contracts and project infra on NoDaLi-sta
- fix stuorra-oslolaš lower case o
- write form to request corpus user account
- document how to apply for access to closed corpus, and details on the corpus
- fix bugs!
Steinar
- test our infrastructure and documentation - follow the documentation exactly,
- started to read the documentation and try the anonymous cvs
- started to read the documentation and try the anonymous cvs
- Complete the semantic sets in sme-dis.rle
- done some work
- done some work
- missing lists
- report conversion errors to Saara
- not done
- not done
- Look at the actio compound issue when adding from missing lists
- not done
- not done
- lexicalise actio compounds. Example: vuolggasadji vs. vuolginsadji
- Go through the Num bugs
- not done
- not done
- fix bugs!
Thomas
- refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
- not done
- not done
- work with compounding
- not done
- not done
- lexicalise actio compounds
- not done
- not done
- Lack of lowering before hyphen: Twol rewrite.
- not done
- not done
- Go through the Num bugs
- working with sme
- working with sme
- fix stuorra-oslolaš lower case o
- not done
- not done
- implement discontinous case inflection for numbers
- working with sme
- working with sme
- produce correct number base forms in the analyzer
- working with sme
- working with sme
-
fix bugs!
- not this week
Tomi
- add compound stems to the PLX generation
- done
- done
- include numerals in the speller
- done
- done
- add prefixes to the PLX
- not done
- not done
- add smj to PLX conversion
- done
- done
- add derivations to the PLX generation
- not done
- not done
- fix bugs!
Trond
- update the smj proper noun lexicon, and refine the morphological analysis,
- no smj.
- fix sme texts in corpus this month
- Worked on this
- Worked on this
- find missing nob parallel texts in corpus, go through Saara's list
- Not found
- Not found
- report conversion errors to Saara
- Worked on this
- Worked on this
- Go through the Num bugs
- Not done.
- Not done.
- implement discontinous case inflection for numbers
- Not participated here
- Not participated here
- produce correct number base forms in the analyzer
- Only wrote testbed, not participated otherwise
- Only wrote testbed, not participated otherwise
- write form to request corpus user account
- Not done
- Not done
- document how to apply for access to closed corpus, and details on the corpus
- Not done
- Not done
- Write project presentation
- Done mostly this, 3 presentations in the pipeline
- Done mostly this, 3 presentations in the pipeline
- fix bugs!.
3. Documentation
Børre updated the front page, and Steinar started to test the quality of
TODO:
- write form to request corpus user account (Børre, Sjur, Trond)
- document how to apply for access to closed corpus, and details on the corpus
- add short description on our front page on anonymous cvs and corpus access,
- done
4. Corpus gathering
TODO:
-
sme texts: no new additions, fix corpus errors during this month
- missing nob parallel texts should be added if such holes are found
- Go through the list of missing or errouneous nob texts, based upon
- add sma texts to the corpus repository (Børre)
5. Corpus infrastructure
Alignment
TODO:
- go through other directories (nob dicrectories, sd directories), fix
- nothing more yet
Conversion issues
TODO:
- report conversion errors to Saara ( Trond, Steinar)
6. Infrastructure
Steinar started the QA, cf above.
TODO:
- test our infrastructure and documentation - follow the documentation exactly,
- started
- started
- update and fix our documentation and infrastructure as Steinar finds
7. Linguistics
North Sámi
Maaren is now working on lexicalising the actio compounds.
TODO:
- lexicalise actio compounds. Example: vuolggasadji vs. vuolginsadji
- fix stuorra-oslolaš lower case o ( Sjur, Thomas, Trond)
- nothing yet
Numbers:
TODO:
- discontinous case inflection (but only for maximally three-part compound
- working on sme
- working on sme
- produce correct number base forms in the analyzer (Thomas)
- working on sme
- working on sme
- Go through the Num bugs (Thomas)
- working on sme
Lule Sámi
Number inflection and analysis is now fixed and working, thanks to Thomas.
TODO:
- refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
- Lack of lowering/fronting before hyphen: Twol rewrite. (Thomas, Trond)
8. Name lexicon infrastructure
Decisions made in Tromsø can be found in the meeting memo.
Postponed:
- data synchronisation between risten.no and the cvs repo
TODO:
- restructure interface code for easier maintenance, coding and use
- done, except a glitch in Opera (won't fix for the time being)
- done, except a glitch in Opera (won't fix for the time being)
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
- merge placenames which are errouneously in different entries: e.g. Helsinki,
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
9. Spellers
Polderland data generation
Polderland has now given instructions for how to handle genitive compounding and
TODO:
- add smj to PLX conversion (Børre, Tomi)
- done
- done
- send smj PLX data to Polderland (Børre, Tomi)
- decide how to specify nouns requiring genitive first parts
- Include numerals in the speller (Børre, Tomi)
- first version done, but needs more work
- first version done, but needs more work
- add prefixes to the PLX (Børre, Tomi)
- still not
- still not
- add derivations to the PLX generation (Børre, Tomi)
- next after numbers are fixed
OOo speller(s)
TODO when the major part of the PLX conversion is done:
- add Aspell/Hunspell data generation to the lexc2xspell (Tomi - after the
- study Hunspell, perhaps also Soikko (Børre, Sjur, Tomi)
Testing
TODO:
- get an Intel Mac for Tomi (Sjur)
- not yet
Localisation
TODO:
- translate Windows installer text to sme ( Børre, Thomas)
- done
- done
- send smj translations to Polderland (Børre)
- done
10. Other
Corpus contracts
TODO:
- publish corpus contracts and project infra as open-source on NoDaLi-sta
- nothing yet
Bug fixing
59 open Divvun/Disamb bugs, and 23 risten.no bugs
Moving G5
The G5 could now be moved out of Børre's office, to the basement where the
TODO:
- move the G5 to the basement (Børre)
11. Next meeting, closing
The next meeting is 12.2.2007, 09: 30 Norwegian time.
The meeting was closed at 10: 43.
Appendix - task lists for the next week
Boerre
- write form to request corpus user account
- document how to apply for access to closed corpus, and details on the corpus
- update and fix our documentation and infrastructure as Steinar finds
- continue work on script for automatic testing of the spell checker in Word
- fix sme texts in corpus this month
- find missing nob parallel texts in corpus
- work on the Polderland data generation (PLX format conversion)
- Concentrate on compounding
- Concentrate on compounding
- go through other directories, fix parallellity information for other documents
- add sma texts to the corpus repository
- move the G5 to the basement (Børre)
- fix bugs!
Maaren
- lexicalise actio compounds
Saara
- fix sme texts in corpus this month
- continue aligning the rest of the parallel files
- fix problems with xml2lexc if needed
- fix bugs!
Sjur
- name lexicon:
- refactor the rest of the SD-terms editor code
- implement missing propnouns editing functions
- implement improvements decided upon in Tromsø
- refactor the rest of the SD-terms editor code
- hire linguist and programmer
- publish corpus contracts and project infra as open-source on NoDaLi-sta
- fix stuorra-oslolaš lower case o
- write form to request corpus user account
- document how to apply for access to closed corpus, and details on the corpus
- get an Intel Mac for Tomi
- fix bugs!
Steinar
- test our infrastructure and documentation - follow the documentation exactly,
- Complete the semantic sets in sme-dis.rle
- missing lists
- report conversion errors to Saara
- Look at the actio compound issue when adding from missing lists
- lexicalise actio compounds. Example: vuolggasadji vs. vuolginsadji
- Go through the Num bugs
- fix bugs!
Thomas
- refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
- work with compounding
- Lack of lowering before hyphen: Twol rewrite.
- Go through the sme Num bugs
- fix stuorra-oslolaš lower case o
- implement discontinous case inflection for sme numbers
- produce correct number base forms in the sme analyzer
- fix bugs!
Tomi
- improve numerals in the speller
- add prefixes to the PLX
- add derivations to the PLX generation
- fix bugs!
Trond
- update the smj proper noun lexicon, and refine the morphological analysis,
- fix sme texts in corpus this month
- find missing nob parallel texts in corpus, go through Saara's list
- report conversion errors to Saara
- Go through the Num bugs
- implement discontinous case inflection for numbers
- produce correct number base forms in the analyzer
- Write project presentation
- fix bugs!.