Meeting_2007-01-29
Contents:
- Meeting setup
- Agenda
- 1. Opening, agenda review, participants
- 2. Updated task status since last meeting
- 3. Documentation
- 4. Corpus gathering
- 5. Corpus infrastructure
- 6. Infrastructure
- 7. Linguistics
- 8. Name lexicon infrastructure
- 9. Spellers
- 10. Other
- 11. Next meeting, closing
- Appendix - task lists for the next week
Meeting setup
- Date: 29.01.2007
- Time: 09.00 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat
Agenda
- Opening, agenda review
- Reviewing the task list from last week
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Infrastructure
- Linguistics
- name lexicon infrastructure
- Spellers
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 10: 10.
Present: Børre, Sjur, Steinar, Thomas, Tomi, Trond
Absent: Maaren, Saara
Agenda accepted as is.
2. Updated task status since last meeting
Børre
- send smj translations to Polderland
- not done
- not done
- write form to request corpus user account
- not done
- not done
- document how to apply for access to closed corpus, and details on the corpus
and its use in general- not done
- not done
- add short description on our front page on anonymous cvs and corpus access,
with links to relevant documentation- not done
- not done
- update and fix our documentation and infrastructure as Steinar finds
problem areas- not done
- not done
- continue work on script for automatic testing of the spell checker in Word
- not done
- not done
- fix sme texts in corpus this month
- not done
- not done
- find missing nob parallel texts in corpus
- not done
- not done
- translate Windows installer text to sme
- some done
- some done
- work on the Polderland data generation (PLX format conversion)
- Concentrate on compounding
- compounds done
- some done on numerals
- compounds done
- Concentrate on compounding
- go through other directories, fix parallellity information for other documents
- not done
- not done
- add sma texts to the corpus repository
- not done
- not done
- order Intel Macs
- done
- done
-
fix bugs!
- not done
Maaren
- tasks according to Thomas
Saara
- fix sme texts in corpus this month
- done character issues and msword doc table formatting
- done character issues and msword doc table formatting
- send aligned, xml nob texts to Kristen
- done
- done
- fix problems with xml2lexc if needed
- check the problem with pdf-conversion cutting wordforms.
- in progress
- in progress
- fix bugs!
Sjur
- name lexicon:
- restructure interface code for easier maintenance, coding and use
- a lot of work, but still moving too slowly forward - probably need help with
this (Tomi?)
- a lot of work, but still moving too slowly forward - probably need help with
- refactor the rest of the SD-terms editor code
- implement missing propnouns editing functions
- implement improvements decided upon in Tromsø
- restructure interface code for easier maintenance, coding and use
- hire linguist and programmer
- the candidate for the linguist position I contacted, has answered. He is very
interested, and can start April 1.
- the candidate for the linguist position I contacted, has answered. He is very
- publish corpus contracts and project infra on NoDaLi-sta
- not done
- not done
- fix stuorra-oslolaš lower case o
- not done
- not done
- write form to request corpus user account
- not done
- not done
- document how to apply for access to closed corpus, and details on the corpus
and its use in general- not done
- not done
- fix bugs!
Steinar
- test our infrastructure and documentation - follow the documentation exactly,
and find problem areas - report problems to Børre. Start: At the front page.- not done, waiting for a necessary update of our front page og the web site
- not done, waiting for a necessary update of our front page og the web site
- Complete the semantic sets in sme-dis.rle
- worked with verbal sets and bird names
- worked with verbal sets and bird names
- missing lists
- not done
- not done
- report conversion errors to Saara
- not done
- not done
- Look at the actio compound issue when adding from missing lists
- not done
- not done
- lexicalise actio compounds. Example: vuolggasadji vs. vuolginsadji
- not done
- not done
- Go through the Num bugs
- not done
- not done
- fix bugs!
Thomas
- refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
- not this week either
- not this week either
- work with compounding
- awaiting answer from Polder
- awaiting answer from Polder
- lexicalise actio compounds
- redirected to Maaren
- redirected to Maaren
- Lack of lowering before hyphen: Twol rewrite.
- not this week either
- not this week either
- Go through the Num bugs
- begun
- begun
- fix stuorra-oslolaš lower case o
- not this week either
- not this week either
- implement discontinous case inflection for numbers
- done smj
- done smj
- produce correct number base forms in the analyzer
- done smj
- done smj
-
fix bugs!
- not this week
Tomi
- add compound stems to the PLX generation
- done
- done
- include numerals in the speller
- cardinals done?
- cardinals done?
- add prefixes to the PLX
- not done
- not done
- add smj to PLX conversion
- not done
- not done
- add derivations to the PLX generation
- not done
- not done
- fix bugs!
Trond
- update the smj proper noun lexicon, and refine the morphological analysis,
cf. the propernoun-smj-lex.txt- No smj yet.
- No smj yet.
- fix sme texts in corpus this month
- Discussed with Saara and Ilona.
- Discussed with Saara and Ilona.
- find missing nob parallel texts in corpus, go through Saara's list
- Not done.
- Not done.
- report conversion errors to Saara
- Not done.
- Not done.
- Go through the Num bugs
- Not done.
- Not done.
- Make numeral testbed for smj as well
- Done.
- Done.
- Get input on sma hyphenations
- Done. Improved version is checked in.
- Done. Improved version is checked in.
- implement discontinous case inflection for numbers
- Not done.
- Not done.
- produce correct number base forms in the analyzer
- Not done.
- Not done.
- write form to request corpus user account
- Not done.
- Not done.
- document how to apply for access to closed corpus, and details on the corpus
and its use in general- Not done.
- Not done.
-
fix bugs!.
- Done some.
3. Documentation
Nothing done last week.
TODO:
- write form to request corpus user account (Børre, Sjur, Trond)
- document how to apply for access to closed corpus, and details on the corpus
and its use in general (Børre, Sjur, Trond) - add short description on our front page on anonymous cvs and corpus access,
with links to relevant documentation (Børre)
4. Corpus gathering
Nothing new. We need to work systematically on filling our corpus holes,
TODO:
-
sme texts: no new additions, fix corpus errors during this month
( Børre, Trond, Saara) - missing nob parallel texts should be added if such holes are found
( Børre, Trond) - Go through the list of missing or errouneous nob texts, based upon
Saara's perfect list (Børre, Trond) - add sma texts to the corpus repository (Børre)
5. Corpus infrastructure
Alignment
TODO:
- go through other directories (nob dicrectories, sd directories), fix
parallellity information for other documents (2 hours) ( Børre) - when aligned, send aligned, xml nob texts to Kristin ( Saara)
- done
Conversion issues
TODO:
- report conversion errors to Saara ( Trond, Steinar)
- Have a look at the two suggestions for pdf discussed in the previous
meeting (Saara)- implemented replacement of r vv with rvv. The source of the error is in
pdf-conversion, where the space between r and double v is falsely interpreted as space-mark. This concerns only one document and only r and double v.- Comment: any initial double consonant is an indication of a space
too much (no initial geminates in Sámi).
- Comment: any initial double consonant is an indication of a space
- The hyphens in page breaks are now replaced with <hyph/>, although I'm still
testing it.
- implemented replacement of r vv with rvv. The source of the error is in
6. Infrastructure
Nothing happened last week.
TODO:
- test our infrastructure and documentation - follow the documentation exactly,
and find problem areas - report problems to Børre. Start: At the front page. (Steinar) - update and fix our documentation and infrastructure as Steinar finds
problem areas (Børre)
7. Linguistics
North Sámi
Maaren is now working on lexicalising the actio compounds.
TODO:
- lexicalise actio compounds. Example: vuolggasadji vs. vuolginsadji
( Thomas, Maaren, Steinar) - fix stuorra-oslolaš lower case o ( Sjur, Thomas, Trond)
Numbers:
TODO:
- discontinous case inflection (but only for maximally three-part compound
numerals) (viđain/goalmmát/logiin and guvttiin/logiin/viđain) ( Thomas, Trond)- done smj
- done smj
- produce correct number base forms in the analyzer (Thomas, Trond)
- done smj
- done smj
- Go through the Num bugs (Trond, Thomas, Steinar)
- done smj one bug #372
- done smj one bug #372
- Preprocessing of ordinals at the end of sentences - reported as bug #368.
( Trond)
Hyphenation problem
TODO:
- ask Ove Lorentz to report on our sma hyphenator (Trond)
- Done. Still minor problems with handling of all-caps forms, but otherwise
ok.
- Done. Still minor problems with handling of all-caps forms, but otherwise
Lule Sámi
TODO:
- refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
( Thomas, Trond) - Lack of lowering/fronting before hyphen: Twol rewrite. (Thomas, Trond)
- Set up a test bed for numerals, test and revise (Trond)
- done
- done
- also done: numbers
8. Name lexicon infrastructure
Decisions made in Tromsø can be found in the meeting memo.
Postponed:
- data synchronisation between risten.no and the cvs repo
TODO:
- restructure interface code for easier maintenance, coding and use
- well under way, still some work
- well under way, still some work
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
well) (the morphological section should be kept intact, in e.g. propernoun-sme-morph.txt) (Sjur, Saara) - convert propernoun-($lang)-lex.txt to a derived file from common xml files
( Sjur, Tomi, Saara) - start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
(e.g. @type=secondary) (Thomas, Maaren, linguists) - merge placenames which are errouneously in different entries: e.g. Helsinki,
Helsingfors, Helsset (linguists) - publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
( linguists)
9. Spellers
Polderland data generation
TODO:
- add smj to PLX conversion (Børre, Tomi)
- Include numerals in the speller (Børre, Tomi)
- first version done, but needs more work
- first version done, but needs more work
- add prefixes to the PLX (Børre, Tomi)
- not yet
- not yet
- add derivations to the PLX generation (Børre, Tomi)
Aspell
TODO when the major part of the PLX conversion is done:
- add Aspell/Hunspell data generation to the lexc2xspell (Tomi - after the
PLX data generation is finished) - study Hunspell, perhaps also Soikko (Børre, Sjur, Tomi)
Testing
TODO:
- get an Intel Mac for testing Windows spellers (Børre)
- done
Localisation
TODO:
- translate Windows installer text to sme ( Børre, Thomas)
- some more done, roughly 50 % done
- some more done, roughly 50 % done
- send smj translations to Polderland (Børre)
- not yet
10. Other
Corpus contracts
TODO:
- publish corpus contracts and project infra on NoDaLi-sta (Sjur)
Bug fixing
57 open Divvun/Disamb bugs, and 23 risten.no bugs
KUNSTI final meeting
Conference invitation can be found here.
http: //tinyurl.com/326lfy
8.-9. February (Thursday & Friday), Oslo. Thomas could present the morphological
11. Next meeting, closing
The next meeting is 5.2.2007, 09: 30 Norwegian time.
The meeting was closed at 11: 01.
Appendix - task lists for the next week
Boerre
- send smj translations to Polderland
- write form to request corpus user account
- document how to apply for access to closed corpus, and details on the corpus
and its use in general - add short description on our front page on anonymous cvs and corpus access,
with links to relevant documentation - update and fix our documentation and infrastructure as Steinar finds
problem areas - continue work on script for automatic testing of the spell checker in Word
- fix sme texts in corpus this month
- find missing nob parallel texts in corpus
- translate Windows installer text to sme
- work on the Polderland data generation (PLX format conversion)
- Concentrate on compounding
- Concentrate on compounding
- go through other directories, fix parallellity information for other documents
- add sma texts to the corpus repository
- order Intel Macs
- fix bugs!
Maaren
- tasks according to Thomas
Saara
- fix sme texts in corpus this month
- continue aligning the rest of the parallel files
- fix problems with xml2lexc if needed
- fix bugs!
Sjur
- name lexicon:
- restructure interface code for easier maintenance, coding and use
- refactor the rest of the SD-terms editor code
- implement missing propnouns editing functions
- implement improvements decided upon in Tromsø
- restructure interface code for easier maintenance, coding and use
- hire linguist and programmer
- publish corpus contracts and project infra on NoDaLi-sta
- fix stuorra-oslolaš lower case o
- write form to request corpus user account
- document how to apply for access to closed corpus, and details on the corpus
and its use in general - fix bugs!
Steinar
- test our infrastructure and documentation - follow the documentation exactly,
and find problem areas - report problems to Børre. Start: At the front page. - Complete the semantic sets in sme-dis.rle
- missing lists
- report conversion errors to Saara
- Look at the actio compound issue when adding from missing lists
- lexicalise actio compounds. Example: vuolggasadji vs. vuolginsadji
- Go through the Num bugs
- fix bugs!
Thomas
- refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
- work with compounding
- lexicalise actio compounds
- Lack of lowering before hyphen: Twol rewrite.
- Go through the Num bugs
- fix stuorra-oslolaš lower case o
- implement discontinous case inflection for numbers
- produce correct number base forms in the analyzer
- fix bugs!
Tomi
- add compound stems to the PLX generation
- include numerals in the speller
- add prefixes to the PLX
- add smj to PLX conversion
- add derivations to the PLX generation
- fix bugs!
Trond
- update the smj proper noun lexicon, and refine the morphological analysis,
cf. the propernoun-smj-lex.txt - fix sme texts in corpus this month
- find missing nob parallel texts in corpus, go through Saara's list
- report conversion errors to Saara
- Go through the Num bugs
- implement discontinous case inflection for numbers
- produce correct number base forms in the analyzer
- write form to request corpus user account
- document how to apply for access to closed corpus, and details on the corpus
and its use in general - Write project presentation
- fix bugs!.

