Meeting_2007-05-07
Contents:
- Meeting setup
- Agenda
- 1. Opening, agenda review, participants
- 2. Updated task status since last meeting
- 3. Documentation
- 4. Corpus gathering
- 5. Corpus infrastructure
- 6. Infrastructure
- 7. Linguistics
- 8. Name lexicon infrastructure
- 9. Spellers
- 10. Other
- 11. Next meeting, closing
- Appendix - task lists for the next week
Meeting setup
- Date: 07.05.2007
- Time: 09.30 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat
Agenda
- Opening, agenda review
- Reviewing the task list from last week
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Infrastructure
- Linguistics
- name lexicon infrastructure
- Spellers
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 00: 48.
Present: Børre, Maaren, Per-Eric, Sjur, Steinar, Thomas, Tomi, Trond
Absent: Saara
Agenda accepted as is.
2. Updated task status since last meeting
Børre
- add sma texts to the corpus repository
- not done
- not done
- collect a list of PR recipients, forward to Berit Karen Paulsen
- not done
- not done
- run all known spelling errors in the prooftest corpus through the speller
- not done
- not done
- add extraction of all known spelling errors in the regular corpus (not the
- not done
- not done
- update installer packages with latest speller lexicon
- still no new speller lexicon
- still no new speller lexicon
- add numbers, compound restrictions to both spellers if time permits
- not done
- not done
- update and fix our documentation and infrastructure as Steinar finds
- not done
- not done
- find missing nob parallel texts in corpus
- not done
- not done
- study the Hunspell formalism in detail together with Sjur and Tomi
- nothing new
- nothing new
- add info to front page (incl. download links)
- not done
- not done
- write separate page with detailed info (incl. download links) (Børre)
- a separate page for the beta speller, with installation instructions, etc.
- not done
- not done
- a separate page for the beta speller, with installation instructions, etc.
- translate press release, web pages (Børre, Thomas, whoever)
- not done
- not done
- fix internet setup for Per-Eric's satelite modem
- ask MacOffice for larger disks for the G5
- not done
- not done
- ask for larger disks for victorio
- done, Roy Dragseth will look at it
- done, Roy Dragseth will look at it
- ask for newer server OS on victorio, many of the installed tools are quite old
- under consideration,
- under consideration,
-
fix bugs!
- not done
Maaren
- lexicalise actio compounds
- done little
- done little
- Manually mark speller test documents for typos
- working
Per-Eric
- expand the smj typos list
- add missing smj words
Saara
- prepare more files for manual alignment
- all files available are now aligned
- all files available are now aligned
- improve cgi-bin scripts
- add new features to the paradigm generator
- not done, will be done this week with documentation
- not done, will be done this week with documentation
- add new features to the paradigm generator
- add new XSL/XML headers for proofing test docs
- compilation of verb lists
- read the manual for graphical corpus interface and try to add files with Lars.
- read the manual, will add files after some fixes
- read the manual, will add files after some fixes
-
fix bugs!
- other
- submitted the law article
Sjur
- finish press release for the beta
- not done
- not done
- collect a list of PR recipients
- not done
- not done
- run all known spelling errors in the corpus through the speller
- not done
- not done
- document the AppleScript testing tool
- not done
- not done
- integrate regression self tests with the make file
- not done
- not done
- make improved smj speller (incl. derivations and compounds)
- took most of my time, but it still won't compile - I feel like banned by
- took most of my time, but it still won't compile - I feel like banned by
- improve speller test bench
- not done
- not done
- update installer packages with latest speller lexicon
- not done
- not done
- integrate the ccat speller testing options in the make file
- not done
- not done
- fix internet setup for Per-Eric's satelite modem
- bugged the Swedish company making the modem, received some very initial
- bugged the Swedish company making the modem, received some very initial
- ask for Linux version of the Polderland command-line speller (for victorio)
- no Polderland meeting last week (our contact person was on vacation)
- no Polderland meeting last week (our contact person was on vacation)
- ask for mklex for Linux (victorio) from Polderland (for victorio)
- no Polderland meeting last week (our contact person was on vacation)
- no Polderland meeting last week (our contact person was on vacation)
- look over the Bugzilla status mails
- asked Børre to do this
- asked Børre to do this
-
fix bugs!
- other:
- read through and commented Saara's law article
Steinar
- Beta testing: Align manually (shorter texts)
- Manually mark speller test texts for typos (making them into gold standards),
- added marked news-files, working with facta-files
- added marked news-files, working with facta-files
- include the files already publically tested into the prooftest cataloge
- not done yet
- not done yet
- Complete the semantic sets in sme-dis.rle
- no work lately
- no work lately
- missing lists
- no work lately
- no work lately
- Look at the actio compound issue when adding from missing lists
- not done
- not done
- Align corpus manually
- fix bugs!
Thomas
- work with compounding
- worked and still working
- worked and still working
- Lack of lowering before hyphen: Twol rewrite.
- not done
- not done
- translate beta release docs to sme and smj
- not done
- not done
- Add potential speller test texts
- not done
- not done
-
fix bugs!
- assisted
Tomi
- make improved smj speller (incl. derivations and compounds)
- not done
- not done
- add numbers, compound restrictions to both spellers if time permits
- make PLX conversion test sample; add conversion testing to the make file
- not done
- not done
- improve number PLX conversion
- not done
- not done
- improve prefix and middle-noun PLX conversion
- not done
- not done
- integrate the ccat speller testing options in the Makefile
- not done
- not done
-
fix bugs!
- fixed some
- fixed some
- other:
- improved plx conversion
- added hyphen conversion to fst
- added hard hyphens to fst
- added hyphen conversion to fst
- improved plx conversion
Trond
- Work on the parallel corpus issues
- Not this week.
- Not this week.
- Postpone these tasks to after the beta:
- update the smj proper noun lexicon, and refine the morphological
- Go through the Num bugs
- update the smj proper noun lexicon, and refine the morphological
- collect a list of PR recipients
- Not done.
- Not done.
- check with Lars when more aligned texts are ready
- Still open.
- Still open.
- ask for larger disks for victorio
- Done, new disc offer forthcoming.
- Done, new disc offer forthcoming.
- ask for newer server OS on victorio, many of the installed tools are quite old
- Done, forthcoming.
- Done, forthcoming.
- fix bugs!.
3. Documentation
Nothing new.
The open documentation issues fall into these three categories:
- Beta documentation for testers
- Documentation for the online corpora
- General documentation improvement after Steinar's test (for open-source
TODO:
- write form to request corpus user account (Børre, Sjur, Trond)
- delayed till after the beta release
- delayed till after the beta release
- document how to apply for access to closed corpus, and details on the corpus
- delayed till after the beta release
- delayed till after the beta release
- correct and improve it based on feedback from Steinar ( Børre)
- low priority
- low priority
- beta documentation (see separate beta section below)
4. Corpus gathering
Nothing new.
TODO:
-
sme texts: no new additions, fix corpus errors during this month
- missing nob parallel texts should be added if such holes are found
- Go through the list of missing or errouneous nob texts, based upon
- add sma texts to the corpus repository (Børre)
5. Corpus infrastructure
Alignment
All parallell texts are now aligned.
TODO
- check with Lars when more texts are ready (Trond)
6. Infrastructure
The http://giellatekno.uit.no main page has been restructured, and Saara's
TODO:
- update and fix our documentation and infrastructure as Steinar finds
- fix internet setup for Per-Eric's satelite modem (Sjur, Børre)
- this influences iChat, SEE sharing, and ARD connetions
- this influences iChat, SEE sharing, and ARD connetions
- Add a Lule Sámi paradigm generators, and link the p-smj.en.html page
- Translate the paradigm pages to Sámi, fix all the se links (Børre, Trond)
7. Linguistics
North Sámi
Actio compounds not being allowed to compound creates problems for constructions
TODO:
- lexicalise actio compounds. Example: vuolggasadji vs. vuolginsadji
- almost all done
- almost all done
- fix stuorra-oslolaš lower case o ( Sjur, Thomas, Trond)
- postponed till after the public beta
Lule Sámi
Buorrek issue: -k clitic (or abessive?)
What is a clitic? Member of LEXICON K.
- totally uncritical when it comes to hosts (accept all hosts) <===
- does not (usually) interfere with the phonology of the host
If you may add -k to all wordforms pointing to the lexicon K today, then -k
North sami: we use K instead of #, since basically all words (and word forms)
lijge, lage, muorrage - buorrek, *lak
Say, for the sake of the argument that
- -k added to nominal forms
- other K members to any wordform
- ==> calls for split of K lexicon.
LEXICON K ENDLEX ; ! +Clt:clitic ENDLEX ; +Clt+ge:#ge ENDLEX ; +Clt+gen:#gen ENDLEX ; +Clt+ga:#ga ENDLEX ; +Clt+k:#k ENDLEX ;
TODO:
- refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
- research the –k ending, whether it is a clitic or a regular inflection,
8. Name lexicon infrastructure
Decisions made in Tromsø can be found in this meeting memo.
TODO:
- fix bugs in lexc2xml; add comments to the log element (Saara)
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
- implement data synchronisation between risten.no and
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
- merge placenames which are errouneously in different entries: e.g. Helsinki,
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
9. Spellers
OOo speller(s)
TODO after the MS Office Beta is delivered:
- add Hunspell data generation to the lexc2xspell (Tomi - after the
- study the Hunspell formalism in detail (Børre, Sjur, Tomi)
Testing
Spelling Error Markup
TODO:
- Manually mark test texts for typos (making them into gold standards)
- continued
- continued
- Set up ways of adding meta-information (source info, used in testing or not,
- Conduct tests on new beta versions on the basis of the unspoiled gold standard
- include the files already publically tested into the prooftest cataloge
- test each version before beta release
Testing tools
TODO:
- document the AppleScript testing tool (Sjur)
- improve speller test bench (Sjur)
- integrate the ccat speller testing options in the Makefile (Sjur, Tomi)
- integrate the ccat speller testing options in the Makefile (Sjur, Tomi)
- ask for Linux version of the Polderland command-line speller (for victorio)
Regression tests
TODO:
- add extraction of all known spelling errors in the corpus (not the
-
ccat now ready, it should be integrated in the Makefile (Sjur, Tomi)
-
ccat now ready, it should be integrated in the Makefile (Sjur, Tomi)
- test the typos.txt list, and check that all entries are properly corrected
- consider how to do a regression self-test, ie, how to test the full
- extract all the base forms in the lexicon, and run them through the speller
- extract all SUB-marked entries, and run them through the lexicon
- integrate these in the make file (Sjur)
- extract all the base forms in the lexicon, and run them through the speller
Localisation
We need to translate the info added to our front page (and a separate page)
TODO:
- translate beta release docs to sme ( Thomas)
- translate beta release docs to smj ( Thomas)
Lexicon conversion to the PLX format
Børre got the conversion working all the way to the final speller, for
TODO:
- ask MacOffice for larger disks for the G5 (Børre), and report about
- not yet
- not yet
- ask for larger disks for victorio (Børre, Trond)
- discussed, in principle yes, reluctant to the schedule. But in the end,
- discussed, in principle yes, reluctant to the schedule. But in the end,
- ask for newer server OS on victorio, many of the installed tools are quite old
- see previous point.
- see previous point.
- ask for mklex for Linux (victorio) from Polderland (Sjur)
- not yet (no meeting with them last week)
Compounding restrictions
How to include compounding restriction comment tags in the transducers:
giv0ri:giv'ri ALBMI ; !+SgNomCmp +SgGenCmp +PlGenCmp => (using a perl script or similar) +SgNomCmp+SgGenCmp+PlGenCmpgiv0ri:giv'ri ALBMI ; !
TODO:
- improve prefix conversion to PLX (Tomi)
- improve middle noun conversion to PLX (Tomi)
- improve noun + adjective PLX conversion: ( Tomi)
- compounding stems - how do we generate them? Using the java client?
- compounding tags - we need to obey them when making the transducers.
- compounding stems - how do we generate them? Using the java client?
- make conversion test sample; add conversion testing to the make file
- to regression test / QA the PLX conversion.
- to regression test / QA the PLX conversion.
- improve number conversion (Børre, Tomi)
- ask for larger disk for the web server (Trond, Børre)
Public Beta release
TODO:
- working and updated smj speller (Sjur, Tomi)
- add numbers, compound restrictions to both spellers if time permits
- finish press release (Sjur)
- add info to front page (incl. download links) (Børre)
- download page made, only needs to add the speller beta when it is ready.
- download page made, only needs to add the speller beta when it is ready.
- write separate page with detailed info (incl. download links) (Børre)
- a separate page for the beta speller, with installation instructions, etc.
- a separate page for the beta speller, with installation instructions, etc.
- translate press release, web pages (Børre, Thomas, whoever)
- collect a list of PR recipients, forward to Berit Karen Paulsen
- test final beta speller installers on Windows and Mac (Børre)
- update installer packages with latest speller lexicon (Børre, Sjur)
- we need to test the procedure, and make sure it works
10. Other
Corpus contracts
TODO:
- publish corpus contracts and project infra as open-source on NoDaLi-sta
- delayed until the public beta is out the door
Bug fixing
35 open Divvun/Disamb bugs, and 23 risten.no bugs
TODO:
- look over the Bugzilla status mails (Børre)
- not yet
The next gathering
The Sámediggiráđđi meeting is moved one week. Thus, instead of around 13. June,
11. Next meeting, closing
The next meeting is 7.5.2007, 09: 30 Norwegian time.
The meeting was closed at 11: 30.
Appendix - task lists for the next week
Boerre
- add sma texts to the corpus repository
- collect a list of PR recipients, forward to Berit Karen Paulsen
- run all known spelling errors in the prooftest corpus through the speller
- add extraction of all known spelling errors in the regular corpus (not the
- update installer packages with latest speller lexicon
- add numbers, compound restrictions to both spellers if time permits
- update and fix our documentation and infrastructure as Steinar finds
- find missing nob parallel texts in corpus
- study the Hunspell formalism in detail together with Sjur and Tomi
- test the typos.txt list, and check that all entries are properly corrected
- add info to front page (incl. download links)
- write separate page with detailed info (incl. download links) (Børre)
- a separate page for the beta speller, with installation instructions, etc.
- a separate page for the beta speller, with installation instructions, etc.
- translate press release, web pages (Børre, Thomas, whoever)
- update installer packages with latest speller lexicon (Børre, Sjur)
- fix internet setup for Per-Eric's satelite modem
- ask MacOffice for larger disks for the G5
- follow up the server OS on victorio
- fix bugs!
Maaren
- lexicalise actio compounds
- Manually mark speller test documents for typos
Per-Eric
- expand the smj typos list
- add missing smj words
- research the –k ending, whether it is a clitic or a regular inflection
Saara
- improve cgi-bin scripts
- add new features to the paradigm generator
- paradigm generator for Lule Sámi
- add new features to the paradigm generator
- add new XSL/XML headers for proofing test docs
- compilation of verb lists
- read the manual for graphical corpus interface and try to add files with Lars.
- fix bugs!
Sjur
- finish press release for the beta
- collect a list of PR recipients
- run all known spelling errors in the corpus through the speller
- document the AppleScript testing tool
- integrate regression self tests with the make file
- make improved smj speller
- improve speller test bench
- update installer packages with latest speller lexicon
- integrate the ccat speller testing options in the make file
- fix internet setup for Per-Eric's satelite modem
- ask for Linux version of the Polderland command-line speller (for victorio)
- ask for mklex for Linux (victorio) from Polderland (for victorio)
- look over the Bugzilla status mails
- fix bugs!
Steinar
- Beta testing: Align manually (shorter texts)
- Manually mark speller test texts for typos (making them into gold standards),
- include the files already publically tested into the prooftest cataloge
- Complete the semantic sets in sme-dis.rle
- missing lists
- Look at the actio compound issue when adding from missing lists
- fix bugs!
Thomas
- work with compounding
- Lack of lowering before hyphen: Twol rewrite.
- translate beta release docs to sme and smj
- Add potential speller test texts
- research the –k ending, whether it is a clitic or a regular inflection
- fix bugs!
Tomi
- make improved smj speller (incl. derivations and compounds)
- add numbers, compound restrictions to both spellers if time permits
- make PLX conversion test sample; add conversion testing to the make file
- improve number PLX conversion
- improve prefix and middle-noun PLX conversion
- integrate the ccat speller testing options in the Makefile
- fix bugs!
Trond
- Work on the parallel corpus issues
- Add Steinars corr texts to the relevant catalogue
- Postpone these tasks to after the beta:
- update the smj proper noun lexicon, and refine the morphological
- Go through the Num bugs
- update the smj proper noun lexicon, and refine the morphological
- collect a list of PR recipients
- check with Lars when more aligned texts are ready
- Follow up the server OS on victorio
- fix bugs!.