Meeting_2007-06-11
Contents:
- Meeting setup
- Agenda
- 1. Opening, agenda review, participants
- 2. Updated task status since last meeting
- 3. Documentation
- 4. Corpus gathering
- 5. Corpus infrastructure
- 6. Infrastructure
- 7. Linguistics
- 8. Name lexicon infrastructure
- 9. Spellers
- 10. Other
- 11. Next meeting, closing
- Appendix - task lists for the next week
Meeting setup
- Date: 11.06.2007
- Time: 09.30 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat/Skype
Agenda
- Opening, agenda review
- Reviewing the task list from last week
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Infrastructure
- Linguistics
- name lexicon infrastructure
- Spellers
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 09: 57.
Present: Børre, Maaren, Per-Eric, Sjur, Steinar, Thomas, Tomi, Trond
Absent: Saara
Agenda accepted as is.
2. Updated task status since last meeting
Børre
- add sma texts to the corpus repository
- not done
- not done
- run all known spelling errors in the prooftest corpus through the speller
- not done
- not done
- add extraction of all known spelling errors in the regular corpus (not the
- not done
- not done
- update and fix our documentation and infrastructure as Steinar finds
- began work again
- began work again
- study the Hunspell formalism in detail
- nothing new
- nothing new
- contact Davvi Girji / Mikal Aase
- not done
- not done
- install larger disks, new RAM on the G5 when they arrive
- Arrived. Will install it asap.
- Arrived. Will install it asap.
- move list of known bugs to Bugzilla
- not done
- not done
- update/check installed file list and paths for Windows
- not done
- not done
- fix bugs!
Inga
- expand the smj typos list
- work and still working
- work and still working
- add missing smj words
- work and still working
Maaren
- lexicalise actio compounds
- Manually mark speller test documents for typos
Per-Eric
- expand the smj typos list
- work and still working
- work and still working
- add missing smj words
- work and still working
Saara
- improve cgi-bin scripts
- done
- done
- add new XSL/XML headers for proofing test docs
- will do this week
- will do this week
- Try to add files with Lars to the corpus interface.
- fix bugs!
Sjur
- run all known spelling errors in the corpus through the speller
- not done, depends on speller test bench improvements
- not done, depends on speller test bench improvements
- document the AppleScript testing tool
- not done
- not done
- integrate regression self tests with the make file
- not done
- not done
- improve speller test bench
- worked on it, problems with speller test result processing, perl script
- worked on it, problems with speller test result processing, perl script
- integrate the ccat speller testing options in the make file
- worked on it, problems with speller test result processing, perl script
- worked on it, problems with speller test result processing, perl script
- fix internet setup for Per-Eric's satelite modem
- nothing new
- nothing new
- look over the Bugzilla status mails
- nothing new
- nothing new
- contact Davvi Girji / Mikal Aase
- done
- done
- ask Xerox for a commercial lisense for the xfst tools on the G5
- not done
- not done
- check with Sámi publishing houses whether support for CS2 is still needed
- checked Min Áigi, Áššu and Davvi Girji - CS2 not needed so far
- checked Min Áigi, Áššu and Davvi Girji - CS2 not needed so far
- fix stuorra-oslolaš lower case o
- topic for the Drag meeting
- topic for the Drag meeting
-
ö/ä vs ø/æ in speller
- topic for the Drag meeting
- topic for the Drag meeting
- study the Hunspell formalism in detail
- topic for the Drag meeting
- topic for the Drag meeting
- move list of known bugs to Bugzilla
- done
- done
- resend the press release to some channels in Sweden, Finland and Norway
- not done
- not done
- publish corpus contracts and project infra as open-source on NoDaLi-sta
- not done
- not done
-
fix bugs!
- filed many new ones
- filed many new ones
- other:
- finished installation of Parallels Desktop, Windows XP, Office 2007 and our
- finished installation of Parallels Desktop, Windows XP, Office 2007 and our
Steinar
- Beta testing: Align manually (shorter texts)
- Manually mark speller test texts for typos (making them into gold standards),
- added more texts
- added more texts
- Complete the semantic sets in sme-dis.rle
- no work this week
- no work this week
- missing lists
- no work this week
- no work this week
- fix bugs!
Thomas
- work with compounding
- worked
- worked
- Lack of lowering before hyphen: Twol rewrite.
- not done
- not done
-
smj: öä not accepted, only øæ (except for lexicalised names)
- not done
- not done
- fix stuorra-oslolaš lower case o
- not done
- not done
- investigate why actios of 3-syllable verbs are not accepted by the speller
- had some help with this, we will see
- had some help with this, we will see
- investigate why some adverbs of 3-syllable adjectives are not accepted by the
- seem to work
- seem to work
-
fix bugs!
- haven't barely got time
Tomi
- add compounding restrictions to the PLX conversion
- added
- added
- make PLX conversion test sample; add conversion testing to the make file
- not done
- not done
- improve prefix and middle-noun PLX conversion
- done
- done
- integrate the ccat speller testing options in the Makefile
- not done
- not done
- first part of multiword expressions not accepted
- not done
- not done
- open up compounding for all actios
- not done
- not done
-
fix bugs!
- fixed
Trond
- Work on the web corpus issues
- Done some work, yes.
- Done some work, yes.
- update the smj proper noun lexicon, and refine the morphological
- Fixed a fatal bug here (1/3 of names restored!), but not worked more
- Fixed a fatal bug here (1/3 of names restored!), but not worked more
- Go through the Num bugs
- Not done
- Not done
- fix stuorra-oslolaš lower case o
- Not done
- Not done
-
fix bugs!.
- Closed several, but opened more, I am afraid.
3. Documentation
TODO:
- write form to request corpus user account (Børre, Sjur, Trond)
- document how to apply for access to closed corpus, and details on the corpus
- correct and improve it based on feedback from Steinar ( Børre)
4. Corpus gathering
Sjur spoke to Davvi Girji, we will send them a list of the authors contacted
TODO:
-
sme texts: no new additions, fix corpus errors during this month
- missing nob parallel texts should be added if such holes are found
- Go through the list of missing or errouneous nob texts, based upon
- add sma texts to the corpus repository (Børre)
- contact Davvi Girji / Mikal Aase ( Børre, Sjur)
- done
5. Corpus infrastructure
Nothing this week either.
6. Infrastructure
TODO:
- update and fix our documentation and infrastructure as Steinar finds
- working on this one
- working on this one
- fix internet setup for Per-Eric's satelite modem (Sjur, Børre)
- this influences iChat, SEE sharing, and ARD connetions
7. Linguistics
North Sámi
Actio compounds: Maaren and Duomma disagrees about what is correct and
TODO:
- lexicalise actio compounds. Example: vuolggasadji vs. vuolginsadji
- vuolgin- and vuolgga- , both are okei vuolggasadji and vuolgindássi for eks
- possibly turn on free compounding as part of the PLX conversions (ie free
- vuolgin- and vuolgga- , both are okei vuolggasadji and vuolgindássi for eks
- fix stuorra-oslolaš lower case o ( Sjur, Thomas, Trond)
- open up compounding for all actios (Tomi)
Lule Sámi
TODO:
- refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
-
ö/ä vs ø/æ in speller (Thomas, Sjur)
- lexicalise words from the Olavi missing list, but check against the pdf
- add normativity issues to our normativity document (Inga, Thomas)
- investigate why actios of 3-syllable verbs are not accepted by the speller
- norm-lookup does not see these, ordinary look-up sees
- these were grepped out because they containted the string SUB as part
- these were grepped out because they containted the string SUB as part
- norm-lookup does not see these, ordinary look-up sees
- investigate why some adverbs of 3-syllable adjectives are not accepted by the
- norm-look-up sees some, but not all, ordinary look-up sees
- it seems to be fixed, needs to be tested in the new speller
- norm-look-up sees some, but not all, ordinary look-up sees
8. Name lexicon infrastructure
Decisions made in Tromsø can be found in this meeting memo.
TODO:
- fix bugs in lexc2xml; add comments to the log element (Saara)
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
- implement data synchronisation between risten.no and
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
- merge placenames which are errouneously in different entries: e.g. Helsinki,
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
9. Spellers
OOo spellers
Børre, Sjur, Tomi will have a session on this in Drag.
TODO:
- add Hunspell data generation to the lexc2xspell (Tomi - after the
- study the Hunspell formalism in detail (Børre, Sjur, Tomi)
Testing
Spelling Error Markup
Text in other languages should not be marked as spelling errors.
TODO:
- Manually mark test texts for typos (making them into gold standards)
- Set up ways of adding meta-information (source info, used in testing or not,
Testing tools
Sjur is trying to get the ccat typos option integrated in the test targets
TODO:
- document the AppleScript testing tool (Sjur)
- improve speller test bench (Sjur)
- integrate the ccat speller testing options in the Makefile (Sjur, Tomi)
- working
- integrate the ccat speller testing options in the Makefile (Sjur, Tomi)
Regression tests
Nothing new
TODO:
- add extraction of all known spelling errors in the corpus (not the
- test the typos.txt list, and check that all entries are properly corrected
- consider how to do a regression self-test, ie, how to test the full
- extract all the base forms in the lexicon, and run them through the speller
- extract all SUB-marked entries, and run them through the lexicon
- integrate these in the make file (Sjur)
- extract all the base forms in the lexicon, and run them through the speller
Lexicon conversion to the PLX format
TODO:
- install larger disks, new RAM on the G5 when they arrive (Børre)
- received, will be installed soon.
- received, will be installed soon.
- ask for mklex for Linux (victorio) from Polderland (Sjur)
- waiting for the offer
- waiting for the offer
- ask Xerox for a commercial lisense for the xfst tools on the G5 (Sjur)
- add compounding restrictions to the PLX conversion (Tomi)
- done, seems correct, but needs more testing when a new speller is ready.
Compounding restrictions
Compounding restrictions are now integrated in the PLX conversion, thanks to
TODO:
- improve prefix conversion to PLX (Tomi)
- done
- done
- improve middle noun conversion to PLX (Tomi)
- done
- done
- improve noun + adjective PLX conversion: ( Tomi)
- compounding stems - how do we generate them? Using the java client?
- done
- done
- compounding tags - we need to obey them when making the transducers.
- done
- done
- compounding stems - how do we generate them? Using the java client?
- make conversion test sample; add conversion testing to the make file
- to regression test / QA the PLX conversion.
- not done
- to regression test / QA the PLX conversion.
Public Beta follow-up
TODO:
- fix clitics (Tomi)
- done after the release, has to be tested
- can be tested in the small speller - tested,
- can be tested in the small speller - tested,
- done after the release, has to be tested
- file list in Windows not complete (Børre, Sjur)
- test smj on typos (Børre)
- tried, but got an error, thus skipped. Needs to be checked now.
- error reported to Saara
- error reported to Saara
- tried, but got an error, thus skipped. Needs to be checked now.
- celebrate
- NOT done - will do in Drag: )
- NOT done - will do in Drag: )
- resend the press release to some channels in Sweden, Finland and Norway
-
Per-Eric will follow up in Sweden, Tomi in Finland, to make sure we
- Samiradio (Tomi) - they're planning to make a report
- Sami parliament (Tomi)
- Oulu - giellagas (Tomi)
- Lapin yliopisto - Rantala (Trond)
- Helsingin yliopisto - Seurujärvi-Kari (Tomi)
- KOTUS (Sjur)
- Citysaamit (Tomi)
- Oulun saamelaiset (Tomi)
- Samiradio (Tomi) - they're planning to make a report
-
Per-Eric will follow up in Sweden, Tomi in Finland, to make sure we
- move list of known errors to Bugzilla (Børre, Sjur)
- done
10. Other
Summer vacation
When are we taking it? Please fill in the table below:
Name | Starting | Ending |
---|---|---|
Børre | x | x |
Maaren | 9.7. | 10.8. |
Per-Eric | 9.7. | 20.7. |
Saara | 2.7 | 3.8 |
Sjur | x | x |
Steinar | x | x |
Thomas | 9.7. | 12.8. |
Tomi | 9.7. | 5.8. |
Trond | 2.7. | 12.8, but working at the end |
Divvun people also need to send the dates to Julie Eira or
Corpus contracts
TODO:
- publish corpus contracts and project infra as open-source on NoDaLi-sta
Bug fixing
When fixing bugs, record the version number containing the fix in the Bugzilla
56 open Divvun/Disamb bugs (21 of these 56 are speller bugs, 35 are
TODO:
- look over the Bugzilla status mails (Børre)
The meeting in Drag
The Sámi Parliament board has its meeting June 19-21. We should use Monday 18.
- Maaren (?)
- Sjur
- Tomi
Topics for Drag:
- two-level fixes (stuorra-oslolaš)
- OOo/Hunspell
- QA session
- Actio compounding clarifications
- smj work in general
- loan words in -áhta or -áhtta (example: advokáhtta or advokáhta)
SD-ráddi presentation (1 hour):
- demo Divvun
- demo risten.no
- drift av divvun
- drift av risten.nno
- forlenging/nytt prosjekt (ie drift)
- sørsamisk
- terminologi-utvikling
- parallellkorpus
- nordisk samarbeid
Sjur will order rooms for all (except Per-Eric) on Hamarøy Hotell, meeting room either at the Hotel or at Árran. Beds are needed as follows:
- Monday: Sjur, Maaren, Tomi
- Tuesday: Sjur, Maaren, Thomas, Tomi, Trond, Børre
- Wedday: Sjur, Maaren, Tomi, Børre (not at Hamarøy Hotell - it is full)
- Thursday: Sjur, Maaren, Tomi, Børre
TODO:
- order rooms (Sjur)
- order meeting room (Sjur)
- plan presentation (Sjur)
A commercial
An alternative compiler to Xerox is coming up, in
11. Next meeting, closing
The next meeting is 25.6.2007, 10: 30 Norwegian time (or possibly in the
The meeting was closed at 11: 28.
Appendix - task lists for the next week
Boerre
- add sma texts to the corpus repository
- run all known spelling errors in the prooftest corpus through the speller
- add extraction of all known spelling errors in the regular corpus (not the
- update and fix our documentation and infrastructure as Steinar finds
- study the Hunspell formalism in detail
- follow-up contact with Davvi Girji
- install larger disks, new RAM on the G5
- update/check installed file list and paths for Windows
- study the Hunspell formalism in detail
- fix bugs!
Maaren
- lexicalise actio compounds
- Manually mark speller test documents for typos
Per-Eric
- expand the smj typos list
- add missing smj words
- contact media in Sweden about the beta release
Saara
- add new XSL/XML headers for proofing test docs
- Try to add files with Lars to the corpus interface.
- fix bugs!
Sjur
- run all known spelling errors in the corpus through the speller
- document the AppleScript testing tool
- integrate regression self tests with the make file
- improve speller test bench
- integrate the ccat speller testing options in the make file
- fix internet setup for Per-Eric's satelite modem
- look over the Bugzilla status mails
- ask Xerox for a commercial lisense for the xfst tools on the G5
- check with Sámi publishing houses whether support for CS2 is still needed
- resend the press release to some channels in Sweden, Finland and Norway
- publish corpus contracts and project infra as open-source on NoDaLi-sta
- study the Hunspell formalism in detail
- fix bugs!
Steinar
- Beta testing: Align manually (shorter texts)
- Manually mark speller test texts for typos (making them into gold standards),
- Complete the semantic sets in sme-dis.rle
- missing lists
- fix bugs!
Thomas
- work with compounding
- Lack of lowering before hyphen: Twol rewrite.
-
smj: öä not accepted, only øæ (except for lexicalised names)
- fix stuorra-oslolaš lower case o
- add normativity issues to our normativity document
- test new speller for actios of 3-sybbable verbs and adverbs of 3-s adjs.
- fix bugs!
Tomi
- make PLX conversion test sample; add conversion testing to the make file
- integrate the ccat speller testing options in the Makefile
- first part of multiword expressions not accepted
- open up compounding for all actios
- contact Finnish institutions about the speller beta release
- study the Hunspell formalism in detail
- add Hunspell data generation/conversion
- fix bugs!
Trond
- Work on the web corpus issues
- update the smj proper noun lexicon, and refine the morphological
- fix bugs!.