Meeting_2007-01-29
Contents:
- Meeting setup
- Agenda
- 1. Opening, agenda review, participants
- 2. Updated task status since last meeting
- 3. Documentation
- 4. Corpus gathering
- 5. Corpus infrastructure
- 6. Infrastructure
- 7. Linguistics
- 8. Name lexicon infrastructure
- 9. Spellers
- 10. Other
- 11. Next meeting, closing
- Appendix - task lists for the next week
Meeting setup
- Date: 29.01.2007
- Time: 09.00 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat
Agenda
- Opening, agenda review
- Reviewing the task list from last week
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Infrastructure
- Linguistics
- name lexicon infrastructure
- Spellers
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 10: 10.
Present: Børre, Sjur, Steinar, Thomas, Tomi, Trond
Absent: Maaren, Saara
Agenda accepted as is.
2. Updated task status since last meeting
Børre
- send smj translations to Polderland
- not done
- not done
- write form to request corpus user account
- not done
- not done
- document how to apply for access to closed corpus, and details on the corpus
- not done
- not done
- add short description on our front page on anonymous cvs and corpus access,
- not done
- not done
- update and fix our documentation and infrastructure as Steinar finds
- not done
- not done
- continue work on script for automatic testing of the spell checker in Word
- not done
- not done
- fix sme texts in corpus this month
- not done
- not done
- find missing nob parallel texts in corpus
- not done
- not done
- translate Windows installer text to sme
- some done
- some done
- work on the Polderland data generation (PLX format conversion)
- Concentrate on compounding
- compounds done
- some done on numerals
- compounds done
- Concentrate on compounding
- go through other directories, fix parallellity information for other documents
- not done
- not done
- add sma texts to the corpus repository
- not done
- not done
- order Intel Macs
- done
- done
-
fix bugs!
- not done
Maaren
- tasks according to Thomas
Saara
- fix sme texts in corpus this month
- done character issues and msword doc table formatting
- done character issues and msword doc table formatting
- send aligned, xml nob texts to Kristen
- done
- done
- fix problems with xml2lexc if needed
- check the problem with pdf-conversion cutting wordforms.
- in progress
- in progress
- fix bugs!
Sjur
- name lexicon:
- restructure interface code for easier maintenance, coding and use
- a lot of work, but still moving too slowly forward - probably need help with
- a lot of work, but still moving too slowly forward - probably need help with
- refactor the rest of the SD-terms editor code
- implement missing propnouns editing functions
- implement improvements decided upon in Tromsø
- restructure interface code for easier maintenance, coding and use
- hire linguist and programmer
- the candidate for the linguist position I contacted, has answered. He is very
- the candidate for the linguist position I contacted, has answered. He is very
- publish corpus contracts and project infra on NoDaLi-sta
- not done
- not done
- fix stuorra-oslolaš lower case o
- not done
- not done
- write form to request corpus user account
- not done
- not done
- document how to apply for access to closed corpus, and details on the corpus
- not done
- not done
- fix bugs!
Steinar
- test our infrastructure and documentation - follow the documentation exactly,
- not done, waiting for a necessary update of our front page og the web site
- not done, waiting for a necessary update of our front page og the web site
- Complete the semantic sets in sme-dis.rle
- worked with verbal sets and bird names
- worked with verbal sets and bird names
- missing lists
- not done
- not done
- report conversion errors to Saara
- not done
- not done
- Look at the actio compound issue when adding from missing lists
- not done
- not done
- lexicalise actio compounds. Example: vuolggasadji vs. vuolginsadji
- not done
- not done
- Go through the Num bugs
- not done
- not done
- fix bugs!
Thomas
- refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
- not this week either
- not this week either
- work with compounding
- awaiting answer from Polder
- awaiting answer from Polder
- lexicalise actio compounds
- redirected to Maaren
- redirected to Maaren
- Lack of lowering before hyphen: Twol rewrite.
- not this week either
- not this week either
- Go through the Num bugs
- begun
- begun
- fix stuorra-oslolaš lower case o
- not this week either
- not this week either
- implement discontinous case inflection for numbers
- done smj
- done smj
- produce correct number base forms in the analyzer
- done smj
- done smj
-
fix bugs!
- not this week
Tomi
- add compound stems to the PLX generation
- done
- done
- include numerals in the speller
- cardinals done?
- cardinals done?
- add prefixes to the PLX
- not done
- not done
- add smj to PLX conversion
- not done
- not done
- add derivations to the PLX generation
- not done
- not done
- fix bugs!
Trond
- update the smj proper noun lexicon, and refine the morphological analysis,
- No smj yet.
- No smj yet.
- fix sme texts in corpus this month
- Discussed with Saara and Ilona.
- Discussed with Saara and Ilona.
- find missing nob parallel texts in corpus, go through Saara's list
- Not done.
- Not done.
- report conversion errors to Saara
- Not done.
- Not done.
- Go through the Num bugs
- Not done.
- Not done.
- Make numeral testbed for smj as well
- Done.
- Done.
- Get input on sma hyphenations
- Done. Improved version is checked in.
- Done. Improved version is checked in.
- implement discontinous case inflection for numbers
- Not done.
- Not done.
- produce correct number base forms in the analyzer
- Not done.
- Not done.
- write form to request corpus user account
- Not done.
- Not done.
- document how to apply for access to closed corpus, and details on the corpus
- Not done.
- Not done.
-
fix bugs!.
- Done some.
3. Documentation
Nothing done last week.
TODO:
- write form to request corpus user account (Børre, Sjur, Trond)
- document how to apply for access to closed corpus, and details on the corpus
- add short description on our front page on anonymous cvs and corpus access,
4. Corpus gathering
Nothing new. We need to work systematically on filling our corpus holes,
TODO:
-
sme texts: no new additions, fix corpus errors during this month
- missing nob parallel texts should be added if such holes are found
- Go through the list of missing or errouneous nob texts, based upon
- add sma texts to the corpus repository (Børre)
5. Corpus infrastructure
Alignment
TODO:
- go through other directories (nob dicrectories, sd directories), fix
- when aligned, send aligned, xml nob texts to Kristin ( Saara)
- done
Conversion issues
TODO:
- report conversion errors to Saara ( Trond, Steinar)
- Have a look at the two suggestions for pdf discussed in the previous
- implemented replacement of r vv with rvv. The source of the error is in
- Comment: any initial double consonant is an indication of a space
- Comment: any initial double consonant is an indication of a space
- The hyphens in page breaks are now replaced with <hyph/>, although I'm still
- implemented replacement of r vv with rvv. The source of the error is in
6. Infrastructure
Nothing happened last week.
TODO:
- test our infrastructure and documentation - follow the documentation exactly,
- update and fix our documentation and infrastructure as Steinar finds
7. Linguistics
North Sámi
Maaren is now working on lexicalising the actio compounds.
TODO:
- lexicalise actio compounds. Example: vuolggasadji vs. vuolginsadji
- fix stuorra-oslolaš lower case o ( Sjur, Thomas, Trond)
Numbers:
TODO:
- discontinous case inflection (but only for maximally three-part compound
- done smj
- done smj
- produce correct number base forms in the analyzer (Thomas, Trond)
- done smj
- done smj
- Go through the Num bugs (Trond, Thomas, Steinar)
- done smj one bug #372
- done smj one bug #372
- Preprocessing of ordinals at the end of sentences - reported as bug #368.
Hyphenation problem
TODO:
- ask Ove Lorentz to report on our sma hyphenator (Trond)
- Done. Still minor problems with handling of all-caps forms, but otherwise
- Done. Still minor problems with handling of all-caps forms, but otherwise
Lule Sámi
TODO:
- refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
- Lack of lowering/fronting before hyphen: Twol rewrite. (Thomas, Trond)
- Set up a test bed for numerals, test and revise (Trond)
- done
- done
- also done: numbers
8. Name lexicon infrastructure
Decisions made in Tromsø can be found in the meeting memo.
Postponed:
- data synchronisation between risten.no and the cvs repo
TODO:
- restructure interface code for easier maintenance, coding and use
- well under way, still some work
- well under way, still some work
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
- merge placenames which are errouneously in different entries: e.g. Helsinki,
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
9. Spellers
Polderland data generation
TODO:
- add smj to PLX conversion (Børre, Tomi)
- Include numerals in the speller (Børre, Tomi)
- first version done, but needs more work
- first version done, but needs more work
- add prefixes to the PLX (Børre, Tomi)
- not yet
- not yet
- add derivations to the PLX generation (Børre, Tomi)
Aspell
TODO when the major part of the PLX conversion is done:
- add Aspell/Hunspell data generation to the lexc2xspell (Tomi - after the
- study Hunspell, perhaps also Soikko (Børre, Sjur, Tomi)
Testing
TODO:
- get an Intel Mac for testing Windows spellers (Børre)
- done
Localisation
TODO:
- translate Windows installer text to sme ( Børre, Thomas)
- some more done, roughly 50 % done
- some more done, roughly 50 % done
- send smj translations to Polderland (Børre)
- not yet
10. Other
Corpus contracts
TODO:
- publish corpus contracts and project infra on NoDaLi-sta (Sjur)
Bug fixing
57 open Divvun/Disamb bugs, and 23 risten.no bugs
KUNSTI final meeting
Conference invitation can be found here.
http: //tinyurl.com/326lfy
8.-9. February (Thursday & Friday), Oslo. Thomas could present the morphological
11. Next meeting, closing
The next meeting is 5.2.2007, 09: 30 Norwegian time.
The meeting was closed at 11: 01.
Appendix - task lists for the next week
Boerre
- send smj translations to Polderland
- write form to request corpus user account
- document how to apply for access to closed corpus, and details on the corpus
- add short description on our front page on anonymous cvs and corpus access,
- update and fix our documentation and infrastructure as Steinar finds
- continue work on script for automatic testing of the spell checker in Word
- fix sme texts in corpus this month
- find missing nob parallel texts in corpus
- translate Windows installer text to sme
- work on the Polderland data generation (PLX format conversion)
- Concentrate on compounding
- Concentrate on compounding
- go through other directories, fix parallellity information for other documents
- add sma texts to the corpus repository
- order Intel Macs
- fix bugs!
Maaren
- tasks according to Thomas
Saara
- fix sme texts in corpus this month
- continue aligning the rest of the parallel files
- fix problems with xml2lexc if needed
- fix bugs!
Sjur
- name lexicon:
- restructure interface code for easier maintenance, coding and use
- refactor the rest of the SD-terms editor code
- implement missing propnouns editing functions
- implement improvements decided upon in Tromsø
- restructure interface code for easier maintenance, coding and use
- hire linguist and programmer
- publish corpus contracts and project infra on NoDaLi-sta
- fix stuorra-oslolaš lower case o
- write form to request corpus user account
- document how to apply for access to closed corpus, and details on the corpus
- fix bugs!
Steinar
- test our infrastructure and documentation - follow the documentation exactly,
- Complete the semantic sets in sme-dis.rle
- missing lists
- report conversion errors to Saara
- Look at the actio compound issue when adding from missing lists
- lexicalise actio compounds. Example: vuolggasadji vs. vuolginsadji
- Go through the Num bugs
- fix bugs!
Thomas
- refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
- work with compounding
- lexicalise actio compounds
- Lack of lowering before hyphen: Twol rewrite.
- Go through the Num bugs
- fix stuorra-oslolaš lower case o
- implement discontinous case inflection for numbers
- produce correct number base forms in the analyzer
- fix bugs!
Tomi
- add compound stems to the PLX generation
- include numerals in the speller
- add prefixes to the PLX
- add smj to PLX conversion
- add derivations to the PLX generation
- fix bugs!
Trond
- update the smj proper noun lexicon, and refine the morphological analysis,
- fix sme texts in corpus this month
- find missing nob parallel texts in corpus, go through Saara's list
- report conversion errors to Saara
- Go through the Num bugs
- implement discontinous case inflection for numbers
- produce correct number base forms in the analyzer
- write form to request corpus user account
- document how to apply for access to closed corpus, and details on the corpus
- Write project presentation
- fix bugs!.