Meeting_2007-10-08
Contents:
- Meeting setup
- Agenda
- 1. Opening, agenda review, participants
- 2. Updated task status since last meeting
- 3. Documentation
- 4. Corpus gathering
- 5. Corpus infrastructure
- 6. Infrastructure
- 7. Linguistics
- 8. Name lexicon infrastructure
- 9. Proofing tools
- 10. Other
- 11. Next meeting, closing
- Appendix - task lists for the next week
Meeting setup
- Date: 8.10.2007
- Time: 09.30 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat/Skype
Agenda
- Opening, agenda review
- Reviewing the task list from last week
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Infrastructure
- Linguistics
- name lexicon infrastructure
- Spellers
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 09: 39.
Present: Børre, Ilona, Per-Eric, Risten, Sjur, Thomas, Tomi, Trond
Absent: none
Agenda accepted as is.
2. Updated task status since last meeting
Børre
- move Steinar's error markup in the xml files to (a copy of) the original
- not done
- not done
- Hunspell lexicon conversion
- nouns, adjs and verbs seem to work okay, other POS'es and CPOS'es (?) don't work as expected
- nouns, adjs and verbs seem to work okay, other POS'es and CPOS'es (?) don't work as expected
- collect/build an e-mail notify list
- not done
- not done
- fix bugs!
Ilona
- lexicalise missing words
- Never done... But now already pretty far with a missing list made of
- Never done... But now already pretty far with a missing list made of
- make sms propernoun-list
- Was done already
- Was done already
- Change NIILLAS-names to ANAR or DUORTNUS.
- Done
Maaren
- lexicalise actio compounds
Per-Eric
- expand the smj typos list
- Worked and still working
- Worked and still working
- add missing smj words
- Worked and still working
- Worked and still working
- lexicalise words from the Olavi missing list
- Worked and still working
- Worked and still working
-
fix bugs!
- Fixed some
Risten
- fixed and open issues to README files
- done
- done
- update translations of README-files - Thursday afternoon
- done
Saara
- add new XSL/XML headers for proofing test docs
- not done
- not done
- Set up ways of adding meta-information for proofing correct corpus docs
- not done
Sjur
- document the AppleScript testing tool
- nothing new
- nothing new
- document the testing procedures
- not yet
- not yet
- work on the XML name editor/risten.no integration
- still nothing
- still nothing
- fixed and open issues to README files
- done
- done
- test correct-type markup with latest enhancements
- nope
- nope
- collect/build an e-mail notify list
- yes
- yes
- update translations of README-files - Thursday afternoon
- several times
- several times
- update installation packages
- as well
- as well
- announce the beta
- today - we need a multilingual e-mail text
- today - we need a multilingual e-mail text
-
fix bugs!
- yes, and reported new ones
- yes, and reported new ones
- other things:
- received, installed and tested InDesign hyphenation - works great! (but there are hyphenation errors, we need a hyphenation command line tool to test the behaviour of the Polderland hyphenation; it has been ordered, and should arrive within the next two weeks).
Thomas
- explain compound-tags to Tomi
- done
- done
- add oslolaš type derivation test cases to smj regresssion file
- not done
- not done
-
sme->smj lexicon conversion to build bilingual lexicon resources
- worked
- worked
- update translations of README-files - Thursday afternoon
- done
- done
-
fix bugs!
- worked some
Tomi
- make PLX conversion test sample; add conversion testing to the make file
- not done
- not done
- Hunspell lexicon conversion
- Børre is doing
- Børre is doing
- fix stuorra-oslolaš lower case o
- this one is fixed? Yes, the latest regression tests are very good: )
- this one is fixed? Yes, the latest regression tests are very good: )
-
sme->smj lexicon conversion to build bilingual lexicon resources
- not done
- not done
- test whether we can revert Makefile changes, and if positive, revert them
- done
- done
-
fix bugs!
- fixed
Trond
- update the smj proper noun lexicon, and refine the morphological
- Not done.
- Not done.
- fix stuorra-oslolaš lower case o
- fixed
- fixed
- add sma texts to the corpus repository
- Analysed the sma texts, they look promising, but will require work in order to be added properly. I suggest postponing this to after christmas (as i have done so far, also)
- Analysed the sma texts, they look promising, but will require work in order to be added properly. I suggest postponing this to after christmas (as i have done so far, also)
-
sme->smj lexicon conversion to build bilingual lexicon resources
- Great progress done, still some minor changes left until it is working
- Great progress done, still some minor changes left until it is working
- update translations of README-files - Thursday afternoon
- Done (well, it might have been Friday...)
- Done (well, it might have been Friday...)
- fix bugs!.
3. Documentation
Bugzilla 3.0.x has some nice features we would like to use, like shared,
TODO:
- add semi-automatic updates of fixed and open issues to README files
- done
- done
- update Bugzilla (Børre)
4. Corpus gathering
Trond had a look at the sma bible texts. We will postpone adding them
Børre has received lots of texts from Torkel Rasmussen, they will
TODO:
- test correct-type markup with latest enhancements (Sjur)
- add texts from Torkel Rasmussen ( Børre)
5. Corpus infrastructure
Nothing.
6. Infrastructure
Speller testing is still fluctuating a bit.
7. Linguistics
North Sámi
Čorru > čorut *Oslolaš with hyphen required, is printed now, but shouldn't oslolaš - is done correctly now
This one is fixed by the latest changes in the PLX conversion.
TODO:
- lexicalise actio compounds. Example: vuolggasadji vs. vuolginsadji
- fix stuorra-oslolaš lower case o ( Tomi)
- fixed
Lule Sámi
We have the same oslolaš derivation in smj too, but with another derivation.
Tjårro > tjårok *Oslolaš with hyphen required, is printed now, but shouldn't oslolaš - is done correctly now
Correct compounds are still not recognised:
Stuorafuoskok => input, should be accepted (218) Stuorauvsuk (217) Stuorraluohkák
fuoskok
StuorFuoskok Stuorafuoskok 2 (220) Stuorruskak (220) Stuoruduskak (219) SUFUR-Fuoskok
Fuoskok is in the PLX lexicon, but does not take part in this type of
Fuoskok Fuossko+N+Prop+Plc+Pl+Nom+Clt+ge
smj propernoun bug issue:
- convert from common base (which means sme base)
- Words not convertable should be added to separate smj lexicon, and words that
- Words not convertable should be added to separate smj lexicon, and words that
- send to smj morphology
The original todo was to correct the smj morphology.
- conversion errors
- words that should not have been converten
- missing smj-unique names
- errors in the morphology
Testing procedures:
- analyse baseforms (as for sme)
- generate a couple of caseforms from the baseforms, and inspect result
Suggestion:
TODO:
- refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
- it is only about adding smj names now, working on it
- it is only about adding smj names now, working on it
- lexicalise words from the Olavi missing list, but check against the pdf
- working on it
- working on it
- add oslolaš type derivation test cases to the regresssion file
- done
- done
-
sme->smj lexicon conversion to build bilingual lexicon resources, and
- working on it
- working on it
- add proper nouns (Thomas, Ilona)
8. Name lexicon infrastructure
This sub-project needs to get up and running soon. Mainly Sjur's task.
Decisions made in Tromsø can be found in this meeting memo.
TODO:
- fix bugs in lexc2xml; add comments to the log element (Saara)
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
- implement data synchronisation between risten.no and
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
- merge placenames which are errouneously in different entries: e.g. Helsinki,
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
9. Proofing tools
Hunspell
Sami languages are not supported in OpenOffice.org, until that is fixed we will
TODO:
- Hunspell lexicon conversion (Tomi, Børre)
- Begin adding support for the sami languages in OpenOffice.org (Børre)
Testing
Spelling Error Markup
TODO:
- Set up ways of adding meta-information (source info, used in testing or not,
- move Steinar's error markup in the xml files to (a copy of) the original
Automated testing
The infrastructure is about to settle.
TODO:
- document the AppleScript testing tool (Sjur)
- document the testing procedures (Sjur)
Lexicon conversion to the PLX format
TODO:
- fix oslolaš bug (Tomi)
- fixed for sme, still open for smj
InDesign tools
We have received the first hyphenation beta for InDesign. It has been tested,
We should give the beta to Min Áigi and Davvi Girji.
TODO:
- make available InDesign hyphenator to Min Áigi/Davvi Girji ( Sjur)
- document the InDesign tools (Sjur)
- add hyphenation testing (Sjur)
Hyphenators
There are some hyphenation errors we need to debug.
TODO:
- get command line hyphenator (Sjur)
- collect list of problematic words for the hyphenator (Sjur, Thomas, all)
New public beta
TODO:
- collect/build an e-mail notify list; we make it simple, a text document with
- done
- done
- update list of fixed and known issues - Tuesday afternoon (Sjur, Risten)
- done
- done
- update translations of README-files - Thursday afternoon
- done
- done
- update installation packages (Sjur)
- done
- done
- announce the beta (Sjur)
- today
Release version
The CD cover etc. will be worked on by John-Marcus Kuhmunen, and will follow the
TODO:
- write text to go on the CD cover (Risten)
- set up CD-printing printer (Risten)
10. Other
Corpus contracts
Delayed till after final release.
TODO:
- publish corpus contracts and project infra as open-source on NoDaLi-sta
Bug fixing
When fixing bugs, record the version number containing the fix in the Bugzilla
59 open Divvun/Disamb bugs (26 of these 56 are speller-related bugs,
11. Next meeting, closing
The next meeting is 15.10.2007, 09: 30 Norwegian time.
Trond will be away.
The meeting was closed at 10: 43.
Appendix - task lists for the next week
Boerre
- move Steinar's error markup in the xml files to (a copy of) the original
- Hunspell lexicon conversion
- update Bugzilla to 3.0.x
- begin adding support for the sami languages in OpenOffice.org
- add texts from Torkel Rasmussen
- fix bugs!
Ilona
- lexicalise missing words
- Will I have new missing lists to do?
- Will I have new missing lists to do?
- Check the Finnish translation
- add smj proper nouns
- other smj tasks
Maaren
- lexicalise actio compounds
Per-Eric
- expand the smj typos list
- add missing smj words
- lexicalise words from the Olavi missing list
- fix bugs!
Risten
- write text to go on the CD cover
- set up CD-printing printer
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
Sjur
- document the AppleScript testing tool
- document the testing procedures
- work on the XML name editor/risten.no integration
- test correct-type markup with latest enhancements
- get command line hyphenator for automated testing of the hyph-lexicons
- collect list of problematic words for the hyphenator
- make available InDesign hyphenator to Min Áigi/Davvi Girji
- document the InDesign tools
- add hyphenation testing
- fix bugs!
Thomas
-
sme->smj lexicon conversion to build bilingual lexicon resources
- add smj proper nouns
- fix bugs!
Tomi
- Hunspell lexicon conversion
-
sme->smj lexicon conversion to build bilingual lexicon resources
- fix oslolaš bug in smj ( Tomi)
- fix bugs!
Trond
-
sme->smj lexicon conversion to build bilingual lexicon resources
- fix bugs!.