Meeting_2007-06-25
Contents:
- Meeting setup
- Agenda
- 1. Opening, agenda review, participants
- 2. Updated task status since last meeting
- 3. Documentation
- 4. Corpus gathering
- 5. Corpus infrastructure
- 6. Infrastructure
- 7. Linguistics
- 8. Name lexicon infrastructure
- 9. Spellers
- 10. Other
- 11. Next meeting, closing
- Appendix - task lists for the next week
Meeting setup
- Date: 25.06.2007
- Time: 09.30 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat/Skype
Agenda
- Opening, agenda review
- Reviewing the task list from last week
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Infrastructure
- Linguistics
- name lexicon infrastructure
- Spellers
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 10: 49.
Present: Maaren, Per-Eric, Sjur, Thomas, Tomi, Trond
Absent: Børre, Saara, Steinar
Agenda accepted as is.
2. Updated task status since last meeting
Børre
- add sma texts to the corpus repository
- run all known spelling errors in the prooftest corpus through the speller
- add extraction of all known spelling errors in the regular corpus (not the
- update and fix our documentation and infrastructure as Steinar finds
- study the Hunspell formalism in detail
- follow-up contact with Davvi Girji
- install larger disks, new RAM on the G5
- update/check installed file list and paths for Windows
- study the Hunspell formalism in detail
- fix bugs!
Maaren
- lexicalise actio compounds
- done some
- done some
- Manually mark speller test documents for typos
- done some
Per-Eric
- expand the smj typos list
- work and still working
- work and still working
- add missing smj words
- work and still working
- work and still working
- contact media in Sweden about the beta release
- I do my best, but my contacts haven't called me back yet
Saara
- add new XSL/XML headers for proofing test docs
- Try to add files with Lars to the corpus interface.
- fix bugs!
Sjur
- run all known spelling errors in the corpus through the speller
- depending on speller test bench improvements
- depending on speller test bench improvements
- document the AppleScript testing tool
- not done
- not done
- integrate regression self tests with the make file
- improve speller test bench
- typos-like tests from correct-files now working - very useful!
- typos-like tests from correct-files now working - very useful!
- integrate the ccat speller testing options in the make file
- the first one done - typos-like testing on correct-doc
- the first one done - typos-like testing on correct-doc
- fix internet setup for Per-Eric's satelite modem
- nothing new
- nothing new
- look over the Bugzilla status mails
- not done
- not done
- ask Xerox for a commercial lisense for the xfst tools on the G5
- not done
- not done
- check with Sámi publishing houses whether support for CS2 is still needed
- done except for Iđut
- done except for Iđut
- resend the press release to some channels in Sweden, Finland and Norway
- will do this in another way
- will do this in another way
- publish corpus contracts and project infra as open-source on NoDaLi-sta
- not done
- not done
- study the Hunspell formalism in detail
- not done
- not done
- fix bugs!
Steinar
- Beta testing: Align manually (shorter texts)
- Manually mark speller test texts for typos (making them into gold standards),
- Complete the semantic sets in sme-dis.rle
- missing lists
- fix bugs!
Thomas
- work with compounding
- a little bit
- a little bit
- Lack of lowering before hyphen: Twol rewrite.
- yes, fixed at last!
- yes, fixed at last!
-
smj: öä not accepted, only øæ (except for lexicalised names)
- done
- done
- fix stuorra-oslolaš lower case o
- not done
- not done
- add normativity issues to our normativity document
- all the time
- all the time
- test new speller for actios of 3-sybbable verbs and adverbs of 3-s adjs.
- not done
- not done
-
fix bugs!
- much bug-fixing done last week
Tomi
- make PLX conversion test sample; add conversion testing to the make file
- not done
- not done
- integrate the ccat speller testing options in the Makefile
- not done
- not done
- first part of multiword expressions not accepted
- should be - needs testing
- should be - needs testing
- open up compounding for all actios
- not done
- not done
- contact Finnish institutions about the speller beta release
- contacted some
- contacted some
- study the Hunspell formalism in detail
- yes
- yes
- add Hunspell data generation/conversion
- started
- started
-
fix bugs!
- other
Trond
- Work on the web corpus issues
- Done some, mainly redoing parallel texts.
- Done some, mainly redoing parallel texts.
- update the smj proper noun lexicon, and refine the morphological
- Split and redone the propernoun lexicon.
- Split and redone the propernoun lexicon.
-
fix bugs!.
- Looked at them, at least.
3. Documentation
TODO:
- write form to request corpus user account (Børre, Sjur, Trond)
- document how to apply for access to closed corpus, and details on the corpus
- correct and improve it based on feedback from Steinar ( Børre)
4. Corpus gathering
Sjur spoke with Solvår Knutsen at Árran, Per-Eric will follow up.
TODO:
-
sme texts: no new additions, fix corpus errors during this month
- missing nob parallel texts should be added if such holes are found
- Go through the list of missing or errouneous nob texts, based upon
- add sma texts to the corpus repository (Børre)
5. Corpus infrastructure
Nothing this week either.
6. Infrastructure
TODO:
- update and fix our documentation and infrastructure as Steinar finds
- working on this one
- working on this one
- fix internet setup for Per-Eric's satelite modem (Sjur)
- this influences iChat, SEE sharing, and ARD connetions
7. Linguistics
North Sámi
TODO:
- lexicalise actio compounds. Example: vuolggasadji vs. vuolginsadji
- possibly turn on free compounding as part of the PLX conversions (ie free
- possibly turn on free compounding as part of the PLX conversions (ie free
- fix stuorra-oslolaš lower case o ( Sjur, Thomas, Trond)
- open up compounding for all actios (Tomi)
Compounding of actors and actios of transitive verbs
The following lexicons needs the specified compounding tags to be applied to all
LEXICON ACTORTV !+SgNomCmp +SgGenCmp +PlGenCmp +SgLeft +SgNomLeft +SgGenLeft +PlGenLeft LEXICON BOAHTINTV ! Long compound-forms +N+Sg+Nom: K ; +N+SgCmp: R ; !+SgCmp +SgLeft +SgNomLeft +SgGenLeft +PlGenLeft
Lule Sámi
TODO:
- refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
-
ä-æ in speller (Thomas, Sjur)
- lexicalise words from the Olavi missing list, but check against the pdf
- still about 2800 words to lexicalise
- still about 2800 words to lexicalise
- add normativity issues to our normativity document (Thomas)
- actios of 3-syllable verbs must be checked in the next speller
- adverbs of 3-syllable adjectives must be tested in the next speller
8. Name lexicon infrastructure
Decisions made in Tromsø can be found in this meeting memo.
TODO:
- fix bugs in lexc2xml; add comments to the log element (Saara)
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
- implement data synchronisation between risten.no and
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
- merge placenames which are errouneously in different entries: e.g. Helsinki,
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
9. Spellers
OOo spellers
Børre, Tomi had multiple sessions on this in Drag.
TODO:
- add Hunspell data conversion to the lexc2xspell (Tomi - after the
- study the Hunspell formalism in detail (Børre, Tomi)
Testing
Spelling Error Markup
Sjur added documentation about the procedure.
TODO:
- Manually mark test texts for typos (Maaren, Steinar)
- Set up ways of adding meta-information (source info, used in testing or not,
Testing tools
Sjur is trying to get the ccat typos option integrated in the test targets
TODO:
- document the AppleScript testing tool (Sjur)
- improve speller test bench (Sjur)
- integrate the ccat speller testing options in the Makefile (Sjur, Tomi)
- working on it
- integrate the ccat speller testing options in the Makefile (Sjur, Tomi)
Regression tests
Nothing new
TODO:
- add extraction of all known spelling errors in the corpus (not the
- test the typos.txt list, and check that all entries are properly corrected
- consider how to do a regression self-test, ie, how to test the full
- extract all the base forms in the lexicon, and run them through the speller
- extract all SUB-marked entries, and run them through the lexicon
- integrate these in the make file (Sjur)
- extract all the base forms in the lexicon, and run them through the speller
Lexicon conversion to the PLX format
TODO:
- install larger disks, new RAM on the G5 when they arrive (Børre)
- done
- done
- ask for mklex for Linux (victorio) from Polderland (Sjur)
- offer received
- offer received
- ask Xerox for a commercial lisense for the xfst tools on the G5 (Sjur)
- add compounding restrictions to the PLX conversion (Tomi)
- done and tested - some errors found, they're added to Bugzilla
Public Beta follow-up
TODO:
- file list in Windows not complete (Børre, Sjur)
- done
- done
- test smj on typos (Børre)
- tried, but got an error, thus skipped. Needs to be checked now.
- error reported to Saara
- fixed
- fixed
- smj typos tested, many errors found in the typos list itself
- errors in the typos.txt should be fixed (Per-Eric)
- errors in the typos.txt should be fixed (Per-Eric)
- rerun the test (Sjur)
- tried, but got an error, thus skipped. Needs to be checked now.
- celebrate
- done
- done
- resend the press release to some channels in Sweden, Finland and Norway
- Other finnish institutions to contact could be:
- Samiradio (Tomi) - they're planning to make a report, have contacted
- Sami parliament (Tomi)
- Oulu - giellagas (Tomi) - talked to some people
- Lapin yliopisto - Rantala (Trond)
- Helsingin yliopisto - Seurujärvi-Kari (Tomi)
- KOTUS (Sjur)
- Citysaamit (Tomi)
- Oulun saamelaiset (Tomi)
- Samiradio (Tomi) - they're planning to make a report, have contacted
- Other finnish institutions to contact could be:
10. Other
Summer vacation
When are we taking it? Please fill in the table below:
Name | Starting | Ending |
---|---|---|
Børre | 25.6. | 8.7. |
Maaren | 9.7. | 10.8. |
Per-Eric | 9.7. | 20.7. |
Saara | 2.7 | 3.8 |
Sjur | x | x |
Thomas | 9.7. | 12.8. |
Tomi | 9.7. | 5.8. |
Trond | 2.7. | 12.8, but working at the end |
Divvun people also need to send the dates to Julie Eira or
Corpus contracts
TODO:
- publish corpus contracts and project infra as open-source on NoDaLi-sta
Bug fixing
When fixing bugs, record the version number containing the fix in the Bugzilla
56 open Divvun/Disamb bugs (21 of these 56 are speller bugs, 35 are
TODO:
- look over the Bugzilla status mails (Børre)
11. Next meeting, closing
The next meeting is 2.7.2007, 9: 30 Norwegian time.
The meeting was closed at 11: 28.
Appendix - task lists for the next week
Boerre
- add sma texts to the corpus repository
- run all known spelling errors in the prooftest corpus through the speller
- add extraction of all known spelling errors in the regular corpus (not the
- update and fix our documentation and infrastructure as Steinar finds
- study the Hunspell formalism in detail
- follow-up contact with Davvi Girji
- fix bugs!
Maaren
- lexicalise actio compounds
- Manually mark speller test documents for typos
Per-Eric
- expand the smj typos list
- add missing smj words
- fix errors in smj/src/typos.txt
Saara
- add new XSL/XML headers for proofing test docs
- Try to add files with Lars to the corpus interface.
- fix bugs!
Sjur
- run all known spelling errors in the corpus through the speller
- document the AppleScript testing tool
- integrate regression self tests with the make file
- improve speller test bench
- integrate the ccat speller testing options in the make file
- fix internet setup for Per-Eric's satelite modem
- look over the Bugzilla status mails
- ask Xerox for a commercial lisense for the xfst tools on the G5
- check with Sámi publishing houses whether support for CS2 is still needed
- publish corpus contracts and project infra as open-source on NoDaLi-sta
- study the Hunspell formalism in detail
- fix bugs!
Steinar
- Beta testing: Align manually (shorter texts)
- Manually mark speller test texts for typos (making them into gold standards),
- Complete the semantic sets in sme-dis.rle
- missing lists
- fix bugs!
Thomas
- work with compounding
- fix stuorra-oslolaš lower case o
- test new speller for actios of 3-sybbable verbs and adverbs of 3-s adjs.
- fix bugs!
Tomi
- make PLX conversion test sample; add conversion testing to the make file
- integrate the ccat speller testing options in the Makefile
- open up compounding for all actios
- contact Finnish institutions about the speller beta release
- study the Hunspell formalism in detail
- add Hunspell data generation/conversion
- fix bugs!
Trond
- Work on the web corpus issues
- update the smj proper noun lexicon, and refine the morphological
- fix bugs!.