Meeting_2007-06-11
Contents:
- Meeting setup
- Agenda
- 1. Opening, agenda review, participants
- 2. Updated task status since last meeting
- 3. Documentation
- 4. Corpus gathering
- 5. Corpus infrastructure
- 6. Infrastructure
- 7. Linguistics
- 8. Name lexicon infrastructure
- 9. Spellers
- 10. Other
- 11. Next meeting, closing
- Appendix - task lists for the next week
Meeting setup
- Date: 11.06.2007 
- Time: 09.30 Norw. time 
- Place: Internet 
- Tools: SubEthaEdit, iChat/Skype
Agenda
- Opening, agenda review 
- Reviewing the task list from last week 
- Documentation - divvun.no 
- Corpus gathering 
- Corpus infrastructure 
- Infrastructure 
- Linguistics 
- name lexicon infrastructure 
- Spellers 
- Other issues 
- Summary, task lists 
- Closing
1. Opening, agenda review, participants
Opened at 09: 57.
Present: Børre, Maaren, Per-Eric, Sjur, Steinar, Thomas, Tomi, Trond
Absent: Saara
Agenda accepted as is.
2. Updated task status since last meeting
Børre
- add  sma texts to the corpus repository - not done 
 
- not done 
- run all known spelling errors in the prooftest corpus through the speller - not done 
 
- not done 
- add extraction of all known spelling errors in the regular corpus (not the- not done 
 
- not done 
- update and fix our documentation and infrastructure as  Steinar finds - began work again 
 
- began work again 
- study the Hunspell formalism in detail - nothing new 
 
- nothing new 
- contact  Davvi Girji / Mikal Aase 
- not done 
 
- not done 
- install larger disks, new RAM on the G5 when they arrive - Arrived. Will install it asap. 
 
- Arrived. Will install it asap. 
- move list of known bugs to Bugzilla - not done 
 
- not done 
- update/check installed file list and paths for Windows - not done 
 
- not done 
- fix bugs!
Inga
- expand the smj typos list - work and still working 
 
- work and still working 
- add missing smj words - work and still working
 
Maaren
- lexicalise actio compounds 
- Manually mark speller test documents for typos
Per-Eric
- expand the smj typos list - work and still working 
 
- work and still working 
- add missing smj words - work and still working
 
Saara
- improve cgi-bin scripts - done 
 
- done 
- add new XSL/XML headers for proofing test docs - will do this week 
 
- will do this week 
- Try to add files with Lars to the corpus interface. 
- fix bugs!
Sjur
- run all known spelling errors in the corpus through the speller - not done, depends on speller test bench improvements 
 
- not done, depends on speller test bench improvements 
- document the AppleScript testing tool - not done 
 
- not done 
- integrate regression self tests with the make file - not done 
 
- not done 
- improve speller test bench - worked on it, problems with speller test result processing, perl script 
 
- worked on it, problems with speller test result processing, perl script 
- integrate the ccat speller testing options in the make file - worked on it, problems with speller test result processing, perl script 
 
- worked on it, problems with speller test result processing, perl script 
- fix internet setup for  Per-Eric's satelite modem - nothing new 
 
- nothing new 
- look over the Bugzilla status mails - nothing new 
 
- nothing new 
- contact  Davvi Girji / Mikal Aase 
- done 
 
- done 
- ask Xerox for a commercial lisense for the xfst tools on the G5 - not done 
 
- not done 
- check with Sámi publishing houses whether support for CS2 is still needed - checked Min Áigi, Áššu and Davvi Girji - CS2 not needed so far 
 
- checked Min Áigi, Áššu and Davvi Girji - CS2 not needed so far 
- fix stuorra-oslolaš lower case  o 
- topic for the Drag meeting 
 
- topic for the Drag meeting 
- 
ö/ä vs  ø/æ in speller - topic for the Drag meeting 
 
- topic for the Drag meeting 
- study the Hunspell formalism in detail - topic for the Drag meeting 
 
- topic for the Drag meeting 
- move list of known bugs to Bugzilla - done 
 
- done 
- resend the press release to some channels in Sweden, Finland and Norway - not done 
 
- not done 
- publish corpus contracts and project infra as open-source on NoDaLi-sta - not done 
 
- not done 
- 
fix bugs! 
- filed many new ones 
 
- filed many new ones 
- other: - finished installation of Parallels Desktop, Windows XP, Office 2007 and our 
 
- finished installation of Parallels Desktop, Windows XP, Office 2007 and our 
Steinar
- Beta testing: Align manually (shorter texts)
- Manually mark speller test texts for typos (making them into gold standards),- added more texts 
 
- added more texts 
- Complete the semantic sets in sme-dis.rle - no work this week 
 
- no work this week 
- missing lists - no work this week 
 
- no work this week 
- fix bugs!
Thomas
- work with compounding - worked 
 
- worked 
- Lack of lowering before hyphen: Twol rewrite. - not done 
 
- not done 
- 
smj: öä not accepted, only  øæ (except for lexicalised names) - not done 
 
- not done 
- fix stuorra-oslolaš lower case  o 
- not done 
 
- not done 
- investigate why actios of 3-syllable verbs are not accepted by the speller - had some help with this, we will see 
 
- had some help with this, we will see 
- investigate why some adverbs of 3-syllable adjectives are not accepted by the - seem to work 
 
- seem to work 
- 
fix bugs! 
- haven't barely got time
 
Tomi
- add compounding restrictions to the PLX conversion - added 
 
- added 
- make PLX conversion test sample; add conversion testing to the make file - not done 
 
- not done 
- improve prefix and middle-noun PLX conversion - done 
 
- done 
- integrate the  ccat speller testing options in the Makefile - not done 
 
- not done 
- first part of multiword expressions not accepted - not done 
 
- not done 
- open up compounding for all actios - not done 
 
- not done 
- 
fix bugs! 
- fixed
 
Trond
- Work on the web corpus issues - Done some work, yes. 
 
- Done some work, yes. 
- update the  smj proper noun lexicon, and refine the morphological - Fixed a fatal bug here (1/3 of names restored!), but not worked more
 
- Fixed a fatal bug here (1/3 of names restored!), but not worked more
- Go through the Num bugs - Not done 
 
- Not done 
- fix stuorra-oslolaš lower case  o 
- Not done 
 
- Not done 
- 
fix bugs!. - Closed several, but opened more, I am afraid.
 
3. Documentation
TODO: 
- write form to request corpus user account (Børre, Sjur, Trond) 
- document how to apply for access to closed corpus, and details on the corpus - correct and improve it based on feedback from Steinar ( Børre)
4. Corpus gathering
Sjur spoke to Davvi Girji, we will send them a list of the authors contacted 
TODO: 
- 
sme texts: no new additions, fix corpus errors during this month - missing  nob parallel texts should be added if such holes are found - Go through the list of missing or errouneous  nob texts, based upon - add  sma texts to the corpus repository (Børre) 
- contact  Davvi Girji / Mikal Aase ( Børre, Sjur) - done
 
5. Corpus infrastructure
Nothing this week either.
6. Infrastructure
TODO: 
- update and fix our documentation and infrastructure as  Steinar finds - working on this one 
 
- working on this one 
- fix internet setup for  Per-Eric's satelite modem (Sjur, Børre) - this influences iChat, SEE sharing, and ARD connetions
 
7. Linguistics
North Sámi
Actio compounds: Maaren and  Duomma disagrees about what is correct and 
TODO: 
- lexicalise actio compounds. Example: vuolggasadji vs.  vuolginsadji 
- vuolgin- and vuolgga- , both are okei vuolggasadji and vuolgindássi for eks 
- possibly turn on free compounding as part of the PLX conversions (ie free
 
- vuolgin- and vuolgga- , both are okei vuolggasadji and vuolgindássi for eks 
- fix stuorra-oslolaš lower case  o ( Sjur, Thomas, Trond) 
- open up compounding for all actios (Tomi)
Lule Sámi
TODO: 
- refine  smj proper noun lexica, cf. the propernoun-smj-lex.txt - 
ö/ä vs  ø/æ in speller (Thomas, Sjur) 
- lexicalise words from the Olavi missing list, but check against the pdf - add normativity issues to our normativity document (Inga, Thomas) 
- investigate why actios of 3-syllable verbs are not accepted by the speller - norm-lookup does not see these, ordinary look-up sees - these were grepped out because they containted the string  SUB as part 
 
- these were grepped out because they containted the string  SUB as part 
 
- norm-lookup does not see these, ordinary look-up sees 
- investigate why some adverbs of 3-syllable adjectives are not accepted by the - norm-look-up sees some, but not all, ordinary look-up sees - it seems to be fixed, needs to be tested in the new speller
 
 
- norm-look-up sees some, but not all, ordinary look-up sees 
8. Name lexicon infrastructure
Decisions made in Tromsø can be found in this meeting memo.
TODO: 
- fix bugs in lexc2xml; add comments to the log element (Saara) 
- finish first version of the editing (Sjur) 
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond) 
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as- convert propernoun-($lang)-lex.txt to a derived file from common xml files- implement data synchronisation between  risten.no and - start to use the xml file as source file 
- clean terms-sme.xml such that all names have the correct tag for their use - merge placenames which are errouneously in different entries: e.g. Helsinki, - publish the name lexicon on risten.no (Sjur) 
- add missing parallel names for placenames (linguists) 
- add informative links between first names like Niillas and Nils 
9. Spellers
OOo spellers
Børre, Sjur, Tomi will have a session on this in Drag.
TODO: 
- add Hunspell data generation to the lexc2xspell (Tomi - after the - study the Hunspell formalism in detail (Børre, Sjur, Tomi)
Testing
Spelling Error Markup
Text in other languages should not be marked as spelling errors.
TODO: 
- Manually mark test texts for typos (making them into gold standards)- Set up ways of adding meta-information (source info, used in testing or not,
Testing tools
Sjur is trying to get the ccat typos option integrated in the test targets 
TODO: 
- document the AppleScript testing tool (Sjur) 
- improve speller test bench (Sjur) - integrate the ccat speller testing options in the Makefile (Sjur, Tomi) - working
 
 
- integrate the ccat speller testing options in the Makefile (Sjur, Tomi) 
Regression tests
Nothing new
TODO: 
- add extraction of all known spelling errors in the corpus (not the- test the  typos.txt list, and check that all entries are properly corrected - consider how to do a regression  self-test, ie, how to test the full - extract all the base forms in the lexicon, and run them through the speller 
- extract all SUB-marked entries, and run them through the lexicon - integrate these in the make file (Sjur)
 
 
- extract all the base forms in the lexicon, and run them through the speller 
Lexicon conversion to the PLX format
TODO: 
- install larger disks, new RAM on the G5 when they arrive (Børre) - received, will be installed soon. 
 
- received, will be installed soon. 
- ask for mklex for Linux (victorio) from Polderland (Sjur) - waiting for the offer 
 
- waiting for the offer 
- ask Xerox for a commercial lisense for the xfst tools on the G5 (Sjur) 
- add compounding restrictions to the PLX conversion (Tomi) - done, seems correct, but needs more testing when a new speller is ready.
 
Compounding restrictions
Compounding restrictions are now integrated in the PLX conversion, thanks to 
TODO: 
- improve prefix conversion to PLX (Tomi) - done 
 
- done 
- improve middle noun conversion to PLX (Tomi) - done 
 
- done 
- improve noun + adjective PLX conversion: ( Tomi) - compounding stems - how do we generate them? Using the java client? - done 
 
- done 
- compounding tags - we need to obey them when making the transducers.  - done 
 
- done 
 
- compounding stems - how do we generate them? Using the java client? 
- make conversion test sample; add conversion testing to the make file - to regression test / QA the PLX conversion. - not done
 
 
- to regression test / QA the PLX conversion. 
Public Beta follow-up
TODO: 
- fix clitics (Tomi) - done after the release, has to be tested - can be tested in the small speller - tested, 
 
- can be tested in the small speller - tested, 
 
- done after the release, has to be tested 
- file list in Windows not complete (Børre, Sjur) 
- test smj on typos (Børre) - tried, but got an error, thus skipped. Needs to be checked now. - error reported to  Saara 
 
- error reported to  Saara 
 
- tried, but got an error, thus skipped. Needs to be checked now. 
- celebrate - NOT done - will do in Drag: ) 
 
- NOT done - will do in Drag: ) 
- resend the press release to some channels in Sweden, Finland and Norway - 
Per-Eric will follow up in Sweden,  Tomi in Finland, to make sure we - Samiradio (Tomi) - they're planning to make a report 
- Sami parliament (Tomi) 
- Oulu - giellagas (Tomi) 
- Lapin yliopisto - Rantala (Trond) 
- Helsingin yliopisto - Seurujärvi-Kari (Tomi) 
- KOTUS (Sjur) 
- Citysaamit (Tomi) 
- Oulun saamelaiset (Tomi) 
 
- Samiradio (Tomi) - they're planning to make a report 
 
- 
Per-Eric will follow up in Sweden,  Tomi in Finland, to make sure we 
- move list of known errors to Bugzilla (Børre, Sjur) - done
 
10. Other
Summer vacation
When are we taking it? Please fill in the table below:
| Name | Starting | Ending | 
|---|---|---|
| Børre | x | x | 
| Maaren | 9.7. | 10.8. | 
| Per-Eric | 9.7. | 20.7. | 
| Saara | 2.7 | 3.8 | 
| Sjur | x | x | 
| Steinar | x | x | 
| Thomas | 9.7. | 12.8. | 
| Tomi | 9.7. | 5.8. | 
| Trond | 2.7. | 12.8, but working at the end | 
Divvun people also need to send the dates to  Julie Eira or 
Corpus contracts
TODO: 
- publish corpus contracts and project infra as open-source on NoDaLi-sta 
Bug fixing
When fixing bugs, record the version number containing the fix in the Bugzilla 
56 open Divvun/Disamb bugs (21 of these 56 are speller bugs,  35 are 
TODO: 
- look over the Bugzilla status mails (Børre)
The meeting in Drag
The Sámi Parliament board has its meeting June 19-21. We should use Monday 18. 
- Maaren (?)
- Sjur 
- Tomi
Topics for Drag: 
- two-level fixes (stuorra-oslolaš)
- OOo/Hunspell 
- QA session 
- Actio compounding clarifications 
- smj work in general 
- loan words in -áhta or -áhtta (example: advokáhtta or advokáhta)
SD-ráddi presentation (1 hour): 
- demo Divvun 
- demo risten.no 
- drift av divvun 
- drift av risten.nno 
- forlenging/nytt prosjekt (ie drift)
- sørsamisk 
- terminologi-utvikling 
- parallellkorpus 
- nordisk samarbeid
Sjur will order rooms for all (except Per-Eric) on Hamarøy Hotell, meeting room either at the Hotel or at Árran. Beds are needed as follows: 
- Monday: Sjur, Maaren, Tomi 
- Tuesday: Sjur, Maaren, Thomas, Tomi, Trond, Børre 
- Wedday: Sjur, Maaren, Tomi, Børre (not at Hamarøy Hotell - it is full)
- Thursday: Sjur, Maaren, Tomi, Børre
TODO: 
- order rooms (Sjur) 
- order meeting room (Sjur) 
- plan presentation (Sjur)
A commercial
An alternative compiler to Xerox is coming up, in 
11. Next meeting, closing
The next meeting is 25.6.2007, 10: 30 Norwegian time (or possibly in the
The meeting was closed at 11: 28.
Appendix - task lists for the next week
Boerre
- add  sma texts to the corpus repository 
- run all known spelling errors in the prooftest corpus through the speller 
- add extraction of all known spelling errors in the regular corpus (not the- update and fix our documentation and infrastructure as  Steinar finds - study the Hunspell formalism in detail 
- follow-up contact with  Davvi Girji 
- install larger disks, new RAM on the G5 
- update/check installed file list and paths for Windows 
- study the Hunspell formalism in detail 
- fix bugs!
Maaren
- lexicalise actio compounds 
- Manually mark speller test documents for typos
Per-Eric
- expand the smj typos list 
- add missing smj words 
- contact media in Sweden about the beta release
Saara
- add new XSL/XML headers for proofing test docs 
- Try to add files with Lars to the corpus interface. 
- fix bugs!
Sjur
- run all known spelling errors in the corpus through the speller 
- document the AppleScript testing tool 
- integrate regression self tests with the make file 
- improve speller test bench 
- integrate the ccat speller testing options in the make file 
- fix internet setup for  Per-Eric's satelite modem 
- look over the Bugzilla status mails 
- ask Xerox for a commercial lisense for the xfst tools on the G5 
- check with Sámi publishing houses whether support for CS2 is still needed 
- resend the press release to some channels in Sweden, Finland and Norway 
- publish corpus contracts and project infra as open-source on NoDaLi-sta 
- study the Hunspell formalism in detail 
- fix bugs!
Steinar
- Beta testing: Align manually (shorter texts)
- Manually mark speller test texts for typos (making them into gold standards),- Complete the semantic sets in sme-dis.rle 
- missing lists 
- fix bugs!
Thomas
- work with compounding 
- Lack of lowering before hyphen: Twol rewrite. 
- 
smj: öä not accepted, only  øæ (except for lexicalised names) 
- fix stuorra-oslolaš lower case  o 
- add normativity issues to our normativity document 
- test new speller for actios of 3-sybbable verbs and adverbs of 3-s adjs. 
- fix bugs!
Tomi
- make PLX conversion test sample; add conversion testing to the make file 
- integrate the  ccat speller testing options in the Makefile 
- first part of multiword expressions not accepted 
- open up compounding for all actios 
- contact Finnish institutions about the speller beta release 
- study the Hunspell formalism in detail 
- add Hunspell data generation/conversion 
- fix bugs!
Trond
- Work on the web corpus issues 
- update the  smj proper noun lexicon, and refine the morphological - fix bugs!.

