Meeting_2008-04-28
Contents:
- Meeting setup
- Agenda
- Opening, agenda review, participants
- Updated task status since last meeting
- Pedagogical software online
- Documentation
- Corpus gathering
- Future plans, directions and ideas
- Infrastructure
- Linguistics
- Name lexicon infrastructure
- Proofing tools
- Other
- Next meeting, closing
- Appendix - task lists for the next five days
Meeting setup
- Date: 28.4.2008 
- Time: 12.30 Norw. time 
- Place: Internet 
- Tools: SubEthaEdit, iChat/Skype
Agenda
Cf. one of the following, depending on context: 
- the upper bar of the SEE window (provided you use the JSPWiki syntax mode)
- the TOC in Forrest-rendered output, like HTML and PDF
Opening, agenda review, participants
Opened at 10: 06.
Present: Børre, Jovsset, Per-Eric, Sjur, Thomas, Trond
Absent: Tomi
Agenda accepted as is.
Updated task status since last meeting
Børre
- Hunspell lexicon conversion - found out how to convert the four big lexicons to hunspell. No compounding 
 
- found out how to convert the four big lexicons to hunspell. No compounding 
- prepare migration to svn (with Sjur, Trond) - have converted our cvs to svn, svn+ssh access works for me. Have had problems 
 
- have converted our cvs to svn, svn+ssh access works for me. Have had problems 
- release hunspell public beta at the end of April (with Sjur) 
- review hunspell lexicon branch (with Thomas), merge with trunk if ok - 
Thomas is working on it 
 
- 
Thomas is working on it 
- try to repair G5 accounts for iCal Server - not done 
 
- not done 
- fix bugs!
Lene
- Ped project
Maaren
- Put the list of possible sma corpus sources into a document
Per-Eric
- try to find other authors who have  smj texts digitaly, send them contracts - Nothing done 
 
- Nothing done 
- Work with missing list from texts written by Sigga Tuolja Sandström. - Worked and still working 
 
- Worked and still working 
- Work with missing list same_dutkama_pgr.txt - Nothing done yet 
 
- Nothing done yet 
- Work with missing list sameriekta_tjoahkkagæsos.txt - Nothing done yet 
 
- Nothing done yet 
- Keep the contact with Ulf-Stefan Winka who has many more smj texts to add. - Nothing done 
 
- Nothing done 
- 
fix bugs! 
- Nothing done
 
Saara
- add new XSL/XML headers for proofing test docs 
- Set up ways of adding meta-information for proofing correct corpus docs - discuss more parallel texts
Sjur
- gather  sma texts - several new contacts during the trip last week, two half-finished contracts 
 
- several new contacts during the trip last week, two half-finished contracts 
- name db/risten.no - nothing - was travelling last week 
 
- nothing - was travelling last week 
- make an improved  sma project plan - nothing - was travelling last week 
 
- nothing - was travelling last week 
- publish corpus contracts and project infra as open-source on NoDaLi-sta - nothing - was travelling last week 
 
- nothing - was travelling last week 
- prepare migration to svn (with Børre, Trond) - nothing - was travelling last week 
 
- nothing - was travelling last week 
- release hunspell public beta at the end of April (with Børre) - nothing - was travelling last week 
 
- nothing - was travelling last week 
- update the  Changes document - nothing - was travelling last week 
 
- nothing - was travelling last week 
- follow-up on some Polderland-related bugs: 621, 630, 652, 656 - nothing - was travelling last week 
 
- nothing - was travelling last week 
- test latest hunspell lexicon - forgot it: ( 
 
- forgot it: ( 
- InDesign documentation - nothing - was travelling last week 
 
- nothing - was travelling last week 
- 
fix bugs! 
- other things: - made a first version of a MacOS X dictionary for sme-smj and sme-nob based on 
 
- made a first version of a MacOS X dictionary for sme-smj and sme-nob based on 
Thomas
- look at test cases still not behaving properly - nothing this week 
 
- nothing this week 
- review hunspell lexicon branch with  Børre 
- working on this 
 
- working on this 
- 
fix bugs! 
- nothing this week
 
Tomi
- Hunspell lexicon conversion - helped  Børre with this 
 
- helped  Børre with this 
- document how compounding is controlled in the PLX conversion - not done 
 
- not done 
- fix double hyphen bugs - not done 
 
- not done 
- Make a pedagogical speller (after MA thesis is delivered)- not done 
 
- not done 
- fix bugs!
Trond
- Help  Jovsset with vislcg3 and sma 
- Set up Jabber for Lene, Kimme, Saara 
- Prepare svn migration (with Sjur, Børre) 
- fix bugs!.
Pedagogical software online
Working on it: Saara has a demo UI ready for vocabulary games, but not online yet.  Lene has 
TODO: 
- get the content ready (Lene) 
- implement the UI and functionality (Saara) 
- get an easy-to-remember URL (UiT/IT)  
- More thorough skin, layout, ... (External person within the Ped team, - Make a pedagogical speller (Tomi when finished with his MA thesis) - Turn off peripheral compounds (numbers, acros, perhaps names)
- Increase editing distance by one for suggestions? Only possible with limited 
 
- Turn off peripheral compounds (numbers, acros, perhaps names)
Documentation
TODO: 
- start to reorganise the documentation (Børre, Sjur, Trond)
Corpus gathering
Trond, Sjur, Jovsset were travelling last week, and got several contacts for 
TODO: 
- follow up on  sma corpus texts (Sjur, Jovsset) 
- follow-up on the  smj texts from  Kurt Tore ( Per-Eric) 
- get texts from  Sigga Tuolja Sandstrøm, possibly through - other contacts: Nord-Salten avis (Børge Strandskog), Lena Davidsson - Put the list of possible corpus sources into a document - give contract with blank fields to Per-Eric ( Børre)
Future plans, directions and ideas
See a separate document in plan/strat/5year.jspwiki.
Infrastructure
Changes in the lexicon cause big problems for  Lene - she is using up to half 
- tag changes 
- word changes (from lexicalised to compound)
- classification changes (from abbr to acro, or similar)
Conclusions: 
- public pages and services should only be updated manually, and when we know - the whole product line must be more robust, such that small, local changes do - we need a larger test battery to run regularly (every night, before larger- a set of smaller tests that can be run automatically as part of the commit 
To accomodate future enhancements in different directions (in rough order of
- test bench for all parts of our language technology efforts 
- migrate to svn 
- merge gt, kt and st into one, probably after the svn move 
- more modularised make / build infra (prepare for smn, sms, sjd, others)
- close certain parts of the code repository (requires svn)
- set up the Leopard Server features for collaborative support: - permanent chat rooms 
- stored (and indexed) chat transcripts of the chat rooms
- iCal server / group calendars 
- wiki 
 
- permanent chat rooms 
- wiki? (is part of Leopard Server) or other web-based documentation
- improve Forrest stability and i18n support 
- reorganise the documentation content: - differ between target groups 
- get better grouping 
- decide what to write in forrest and what in wiki - update/add missing parts 
 
- differ between target groups 
- migrate lexicons to XML, splitting the task - Name lexica (the Name project)
- Dictionaries (already in XML, task is to integrate them)
- Open POSes (Komi as a test case)
 
- Name lexica (the Name project)
- change the look of the documentation web 
- sfst? Both as replacement for xfst and for hunspell/open-source proofing tools 
- investigate the NSIS installer, potentially replacing the InstallShield - corpus content moved to Max Planck repositories?
TODO: 
- add CG regression test (Lene, Sjur, Trond) 
- make a test-all target that runs all tests we have (Børre, Sjur, Trond) 
- define and document testing routines (Børre, Sjur, Trond) 
- add Jabber account in iChat - check that all accounts are ready for iChat on the G5 (Børre) 
- UiT: Lene, Kimme, Saara (Trond) 
 
- check that all accounts are ready for iChat on the G5 (Børre) 
- prepare migration to svn (Børre, Sjur, Trond) - svn+ssh access is working, please test!   
 
- svn+ssh access is working, please test!   
- try to repair G5 accounts for iCal Server (Børre)
Linguistics
North Sámi
(nothing new, see proofing bugs below)
Lule Sámi
(nothing new, see proofing bugs below)
TODO: 
- 
sme->smj lexicon conversion to build bilingual lexicon resources, and - Add the words when all words are ready.
South Sámi
Nothing new since last week.
Name lexicon infrastructure
Much work last week on the SD-terms collection, and the editing code relating to 
Next up: to finish the work on the SD-terms collection (editing), then look at
TODO: 
- fix i18n bug in risten.no/G5 (so they will work without the proper locale- fix bugs in lexc2xml; add comments to the log element (Saara) 
- finish first version of the editing (Sjur) 
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond) 
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as- convert propernoun-($lang)-lex.txt to a derived file from common xml files- implement data synchronisation between  risten.no and - start to use the xml file as source file 
- clean terms-sme.xml such that all names have the correct tag for their use - merge placenames which are errouneously in different entries: e.g. Helsinki, - publish the name lexicon on risten.no (Sjur) 
- add missing parallel names for placenames (linguists) 
- add informative links between first names like Niillas and Nils 
Proofing tools
Hunspell
Printed out a partial fullword list, containing both the  > and the  » 
Made a python script that parses fullword lists, and prints out working .dic and 
TODO: 
- review hunspell lexicon branch, merge with trunk if ok (Børre, Thomas) 
- test latest lexicon (Sjur) 
- add  smj to the soup, make sure it works roughly as good as  sme 
- added to derivations, needs to be tested 
 
- added to derivations, needs to be tested 
- fix the remaining conversion bugs for  sme ( Børre, Tomi) 
- return to  smj, and fix whatever is left to fix (Børre, Tomi) 
- release a public beta at the end of April (Børre, Sjur)
Testing
Spelling Error Markup
TODO: 
- Set up ways of adding meta-information (source info, used in testing or not,- test new and nested error markup (Sjur)
Speller bugs
List of bugs returned from Polderland: 621, 630, 652, 656, 676.
Open issues based on test results :
sme
- 425 -  REGRESSION: other words from Divvun.no - two words rejected 
- 426 - comp words from Divvun.no -  guoktedássásaš accepted - still  OPEN 
- 435 - roman numbers -  REGRESSION: inflection of single letter numbers - we should pregenerate all numbers once and for all, and store them in a 
 
- we should pregenerate all numbers once and for all, and store them in a 
- 452 -  REGRESSION: several lexical bugs -  oažžuin +  ožžuin 
- 595 - prefix+name wihtout hyphen (ovdaLot instead of  ovda-Lot) - still - 600 - gen+hyph compound  sámi-dáru - still  OPEN 
- 603 - suomabealdi accepted - still  OPEN 
- 606 - speller accepts VUOHTA compound - still  OPEN 
- 607 - acro + hyphen - still  OPEN 
- 
NRKGA is acro + clitic accepted without colon - what is correct? 
 
- 
NRKGA is acro + clitic accepted without colon - what is correct? 
- 611 - double hyphen sugg still accepted - still  OPEN 
- 613 - short gen. as second compound part - still  OPEN 
- 619 - numerals and pronouns to NAMÁK and SASJ fails - still  OPEN 
- 627 - prefix + hyhpen does not get accepted - still  OPEN 
- 629 -  a taking part in compounding without hyphen - still  OPEN 
- 634 - PropGen+hyph+PropGen - still  OPEN 
- 641 - numeral+noun compounds - still  OPEN 
- 642 - noun/adj/proper + hyphen + ain - still  OPEN 
- 644 - cased numeral+numeral compund - still  OPEN 
- 646 - adverb + hyphen + noun - still  OPEN 
- 647 - numerals+NOUN - still  OPEN 
- 648 - unmotivated suggestions with numeral+noun - still  OPEN 
- 649 - name + adj compound without hyphen - still  OPEN 
- 654 - speller does not recognize ordinals on -nuppelogát - still  OPEN 
- 655 - pron + nai - still  OPEN 
- 658 - Suggestion saame - still OPEN
smj
- 435 - roman number - single letter numbers now recognised - we should pregenerate all numbers once and for all, and store them in a - please note that  inflection of single letter numerals is  fine in 
 
- we should pregenerate all numbers once and for all, and store them in a 
- 595 - prefix+name wihtout hyphen (tsåhkeLot instead of  tsåhke-Lot) - - 600 - gen+hyph compound  sáme-dáro - still  OPEN 
- 607 - acro + hyphen - 
NRKGA is acro + clitic accepted without colon - what is correct? 
 
- 
NRKGA is acro + clitic accepted without colon - what is correct? 
- 616 - Bispadime-me-ráden - still  OPEN, try to find an acro or abbr  me 
- 619 - numerals and pronouns to NAMÁK and SASJ fails - still  OPEN 
- 629 -  a taking part in compound - still  OPEN 
- 634 - rop gen + hyphen + Prop gen - still  OPEN 
- 641 - numeral+noun compounds - still  OPEN 
- 644 - cased numeral+numeral compund - still  OPEN 
- 647 - numerals+NOUN - still  OPEN 
- 648 - unmotivated suggestions with numeral+noun - still  OPEN 
- 649 - name + adj compound without hyphen - still  OPEN 
- 650 - noun prefix+name compound without hyphen - still  OPEN 
- 658 - Suggestion saame - still OPEN
TODO: 
- compile new speller lexicons (Tomi) 
- document how compounding is controlled in the PLX conversion (Tomi)
Hyphenator bugs
Open issues based on test results :
sme
- 468 -  REGRESSION: Márkomeanu 
- 547 -  REGRESSION: hyphen in front of vowel: Lotnolasealáhusas 
- 548 -  REGRESSION: mid syllable hyphenation: Háliidivččen 
- 549 -  REGRESSION: division without hyph: Váccedettiin 
- 673 - adj-derivations: guovttenuppelotčoarvvagiin (the word is not rec.) 
- 677 - NEW: Wrongly hyphenated ending -danidja - invalid
smj
- 545 -  REGRESSION: bad hyphenation in compounds: åhpadusorganisásjåvnån 
- 546 -  REGRESSION: obligatory hyph rules seem to work in facultative - 547 -  REGRESSION: hyphen in front of vowel: Jienastimnjuolgadusá and 
InDesign tools
We're waiting for an update from Polderland.
Releases
TODO: 
- update the  Changes document (Sjur) 
- InDesign documentation (Sjur) - Norwegian translation received from Davvi Girji 
 
- Norwegian translation received from Davvi Girji 
- public hunspell beta during first week of May - depends on the hunspell devel. 
- public 1.1 update of the Polderland-based tools during May
Other
Corpus contracts + open source
Now decided to wait until we have changed from cvs to svn.
TODO: 
- publish corpus contracts and project infra as open-source on NoDaLi-sta 
Travel to Røros and Östersund
Done, results somewhat mixed, but overall it was quite useful.
Next meeting, closing
The next meeting is 5.5.2008.
The meeting was closed at 11: 10.
Appendix - task lists for the next five days
Boerre
- Hunspell lexicon conversion 
- prepare migration to svn (with Sjur, Trond) 
- release hunspell public beta at the end of April (with Sjur) 
- review hunspell lexicon branch (with Thomas), merge with trunk if ok 
- try to repair G5 accounts for iCal Server 
- make a test-all target that runs all tests we have 
- define and document testing routines 
- fix bugs!
Jovsset
- follow up on sma corpus texts
Lene
- get the ped content ready 
- Work on test routine with Trond and Sjur
Maaren
- Put the list of possible sma corpus sources into a document
Per-Eric
- try to find other authors who have  smj texts digitaly. 
- Work with missing list from texts written by Sigga Tuolja Sandström. 
- Work with missing list same_dutkama_pgr.txt 
- Work with missing list sameriekta_tjoahkkagæsos.txt 
- Keep the contact with Ulf-Stefan Winka. 
- fix bugs!
Saara
- add new XSL/XML headers for proofing test docs 
- Set up ways of adding meta-information for proofing correct corpus docs - implement the ped UI and functionality
Sjur
- follow up on  sma corpus texts 
- name db/risten.no 
- make an improved  sma project plan 
- publish corpus contracts and project infra as open-source on NoDaLi-sta 
- prepare migration to svn (with Børre, Trond) 
- release hunspell public beta at the end of April (with Børre) 
- update the  Changes document 
- follow-up on some Polderland-related bugs: 621, 630, 652, 656 
- test latest hunspell lexicon 
- InDesign documentation 
- send calendar info to Jovsset 
- add CG regression test with  Lene and  Trond 
- make a test-all target that runs all tests we have 
- define and document testing routines 
- fix bugs!
Thomas
- look at test cases still not behaving properly 
- review hunspell lexicon branch with  Børre 
- fix bugs!
Tomi
- Hunspell lexicon conversion 
- document how compounding is controlled in the PLX conversion 
- fix double hyphen bugs 
- Make a pedagogical speller (after MA thesis is delivered)
- fix bugs!
Trond
- Help Jovsset with vislcg3 and sma 
- Set up Jabber for Lene, Kimme, Saara 
- Prepare svn migration (with Sjur, Børre) 
- Work on test routine with  Lene and  Sjur 
- make a test-all target that runs all tests we have 
- define and document testing routines 
- fix bugs!.

