Meeting_2007-09-03
Contents:
- Meeting setup
- Agenda
- 1. Opening, agenda review, participants
- 2. Updated task status since last meeting
- 3. Documentation
- 4. Corpus gathering
- 5. Corpus infrastructure
- 6. Infrastructure
- 7. Linguistics
- 8. Name lexicon infrastructure
- 9. Spellers
- 10. Other
- 11. Next meeting, closing
- Appendix - task lists for the next week
Meeting setup
- Date: 3.9.2007
- Time: 09.30 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat/Skype
Agenda
- Opening, agenda review
- Reviewing the task list from last week
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Infrastructure
- Linguistics
- name lexicon infrastructure
- Spellers
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 09: 42.
Present: Børre, Ilona, Per-Eric, Sjur, Thomas, Tomi, Trond
Absent: none
Agenda accepted as is.
2. Updated task status since last meeting
Børre
- run all known spelling errors in the prooftest corpus through the speller
- not done
- not done
- add extraction of all known spelling errors in the regular corpus (not the
- not done
- not done
- move Steinar's error markup in the xml files to (a copy of) the original
- working on it
- working on it
- create a speller preprocessor
- not done
- not done
- fix bugs!
Ilona
- lexicalise missing words
- add sme names from FIN
- Continuing. Unfortunately Karttakeskus hasn't included the Sami names to
- Continuing. Unfortunately Karttakeskus hasn't included the Sami names to
- make smn propernoun-list
- Working on it as the names come. Not very intensive, yet. Focusing first to
- Working on it as the names come. Not very intensive, yet. Focusing first to
Maaren
- lexicalise actio compounds
Per-Eric
- expand the smj typos list
- working with it
- working with it
- add missing smj words
- working with it
- working with it
- lexicalise words from the Olavi missing list
- working with it
- working with it
- add compounding tags to adjectives
- working with it, soon finished
Saara
- add new XSL/XML headers for proofing test docs
- fix bugs!
Sjur
- improve speller test bench:
- document the AppleScript testing tool
- not done
- not done
- create a speller preprocessor
-
Saara did this according to specs
-
Saara did this according to specs
- integrate the ccat speller testing options in the make file
- done - finally!
- done - finally!
- document the AppleScript testing tool
- publish corpus contracts and project infra as open-source on NoDaLi-sta
- not done
- not done
- fix stuorra-oslolaš lower case o
- not done
- not done
-
ä/æ in smj speller
- not done
- not done
- work on the XML name editor/risten.no integration
- not done
- not done
- plan the rest of the project period
- some scheduling, not finished
- some scheduling, not finished
- fix sme twol bug (#460), meeting Thursday at 12 AM
- not done
- not done
- fix bug 458
- not done
- not done
-
fix bugs!
- fixed some, discussed some, and filed new ones
- fixed some, discussed some, and filed new ones
- other things:
- compiled new spellers
- more testing
- compiled new spellers
Thomas
- work with compounding
- soon finished with tags
- soon finished with tags
- fix stuorra-oslolaš lower case o
- not done
- not done
-
ä/æ in smj speller
- not done
- not done
- fix sme twol bug (#460), meeting Thursday at 12 AM
- not done
- not done
-
fix bugs!
- worked some with bugs
Tomi
- make PLX conversion test sample; add conversion testing to the make file
- not done
- not done
- add Hunspell data generation/conversion
- working with
- working with
- fix bug 484
- not done
- not done
-
fix bugs!
- fixed other bugs
Trond
- update the smj proper noun lexicon, and refine the morphological
- Backwater, again.
- Backwater, again.
- fix stuorra-oslolaš lower case o
- Not done
- Not done
- add sma texts to the corpus repository
- Not done
- Not done
-
ä/æ in smj speller
- Not done
- Not done
- fix sme twol bug (#460), meeting Thursday at 12 AM
- We never had that meeting. This wednesday or thursday, then.
- We never had that meeting. This wednesday or thursday, then.
- fix bug 458
- fix bugs!.
3. Documentation
User documents, specifically the README files, need to get some semi-automatic
TODO:
- add semi-automatic updates of fixed and open issues to README files
4. Corpus gathering
No news.
TODO:
- add sma Bible texts to the corpus repository (Trond)
- bug Kåre Tjikkom about the smj correct document (Sjur)
5. Corpus infrastructure
UiT is participating in work to get a large European infrastructure project. Our work on corpus infrastructure will constitute an important cornerstone in the Tromsø work.
6. Infrastructure
Nothing new - the Divvun site needs regular restarts, but this is well known.
7. Linguistics
North Sámi
(New) place names should be outfitted with a country code as a comment:
Ávvil ANAR ; !FIN
This !FIN has been added to the base list Ilona is working with. Quick and
TODO:
- lexicalise actio compounds. Example: vuolggasadji vs. vuolginsadji
- fix stuorra-oslolaš lower case o ( Sjur, Thomas, Trond)
- fix twol bug (Sjur, Thomas, Trond)
- meet online this week - Thursday around 9 AM Norwegian time
- meet online this week - Thursday around 9 AM Norwegian time
- add the sme place names from Finland (Ilona)
Lule Sámi
TODO:
- refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
-
ä/æ in speller, see bug report #411 (Tomi, Sjur)
- lexicalise words from the Olavi missing list, but check against the pdf
- working on it - just over 1000 words left
- working on it - just over 1000 words left
- add compounding tags to:
- nouns (Thomas)
- soon finished
- soon finished
- adjs (Per-Eric)
- soon finished
- nouns (Thomas)
8. Name lexicon infrastructure
This sub-project needs to get up and running soon. Mainly Sjur's task.
Decisions made in Tromsø can be found in this meeting memo.
TODO:
- fix bugs in lexc2xml; add comments to the log element (Saara)
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
- implement data synchronisation between risten.no and
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
- merge placenames which are errouneously in different entries: e.g. Helsinki,
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
9. Spellers
New spellers released today (they accept giella-__ but suggest giella--__).
OOo spellers
A first codebase for Hunspell conversion commited today. Roughly 1 week, maximum
TODO:
- add Hunspell data conversion (Tomi)
- progressing
Testing
Spelling Error Markup
TODO:
- Set up ways of adding meta-information (source info, used in testing or not,
- move Steinar's error markup in the xml files to (a copy of) the original
Automated testing
TODO:
- document the AppleScript testing tool (Sjur)
- not finished
- not finished
- improve speller test bench (Sjur)
- create a speller preprocessor (Børre or Sjur)
-
Saara did this
-
Saara did this
- integrate the ccat speller testing options in the Makefile (Sjur)
- done
- done
- test also the correct column of the typos.txt files - now many correct words
- done
- create a speller preprocessor (Børre or Sjur)
Lexicon conversion to the PLX format
TODO:
- fix bug 484 (Tomi)
- fix bug 458 (Trond, Sjur, Tomi)
New public beta
Delayed till the majority of the present bugs are fixed. The twolc bug
10. Other
Corpus contracts
TODO:
- publish corpus contracts and project infra as open-source on NoDaLi-sta
Bug fixing
When fixing bugs, record the version number containing the fix in the Bugzilla
55 open Divvun/Disamb bugs (24 of these 56 are speller-related bugs,
Project meeting
We'll meet in September, 24-28, in Tromsø to work on the hardest remaining
11. Next meeting, closing
The next meeting is 10.9.2007, 09: 30 Norwegian time.
The meeting was closed at 10: 26.
Appendix - task lists for the next week
Boerre
- run all known spelling errors in the prooftest corpus through the speller
- add extraction of all known spelling errors in the regular corpus (not the
- move Steinar's error markup in the xml files to (a copy of) the original
- add semi-automatic updates of fixed and open issues to README files
- fix bugs!
Ilona
- lexicalise missing words
- add sme names from FIN
- make smn propernoun-list
Maaren
- lexicalise actio compounds
Per-Eric
- expand the smj typos list
- add missing smj words
- lexicalise words from the Olavi missing list
- finish with the compounding tags to adjectives
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
Sjur
- document the AppleScript testing tool
- publish corpus contracts and project infra as open-source on NoDaLi-sta
- fix stuorra-oslolaš lower case o
-
ä/æ in smj speller
- work on the XML name editor/risten.no integration
- plan the rest of the project period
- fix sme twol bug (#460), meeting Thursday at 9 AM
- fix bug 458
- bug Kåre Tjikkom about the smj correct document
- fix bugs!
Thomas
- work with compounding
- fix stuorra-oslolaš lower case o
-
ä/æ in smj speller
- fix sme twol bug (#460), meeting Thursday at 9 AM
- fix bugs!
Tomi
- make PLX conversion test sample; add conversion testing to the make file
- add Hunspell data generation/conversion
- fix bug 484
- fix bugs!
Trond
- update the smj proper noun lexicon, and refine the morphological
- fix stuorra-oslolaš lower case o
- add sma texts to the corpus repository
-
ä/æ in smj speller
- fix sme twol bug (#460), meeting Thursday at 9 AM
- fix bug 458
- fix bugs!.