Meeting_2008-06-09
Contents:
- Meeting setup
- Agenda
- Opening, agenda review, participants
- Updated task status since last meeting
- Pedagogical software online
- Documentation
- Corpus gathering
- Future plans, directions and ideas
- Infrastructure
- Linguistics
- Name lexicon/risten.no infrastructure
- Proofing tools
- Other
- Next meeting, closing
- Appendix - task lists for the next five days
Meeting setup
- Date: 2.6.2008
- Time: 09.30 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat
Agenda
Cf. one of the following, depending on context:
- the upper bar of the SEE window (provided you use the JSPWiki syntax mode)
- the TOC in Forrest-rendered output, like HTML and PDF
Opening, agenda review, participants
Opened at 09: 41.
Present: Børre, Jovsset, Per-Eric, Sjur, Thomas, Tomi
Absent: Trond
Agenda accepted as is.
Updated task status since last meeting
Børre
- prepare migration to svn (with Sjur, Trond)
- nothing new
- nothing new
- release hunspell public beta during May (with Sjur)
- made it
- made it
- make a hunspell package that suits linux distributions
- done
- done
- try to repair G5 accounts for iCal Server
- not done
- not done
- make a test-all target that runs all tests we have
- not done
- not done
- define and document testing routines
- not done
- not done
- give test Linux distro to Petter Reinholdtsen on Wednesday
- look into other distro's as well: Mac, Windows
- have read documentation, will make them by Tuesday
- have read documentation, will make them by Tuesday
- write README file, it should say BETA
-
Sjur did it
-
Sjur did it
- write installation and usage instructions (online, and in distro)
- fix the remaining hunspell conversion bugs
- fix bugs!
Jovsset
- follow up on sma corpus texts
- Talk to the students at Uppsala University about the Verbh book and the
- check Hattfjelldal trip with Lene
- order hotel rooms for the Hattfjelldal trip
Lene
- get the ped content ready
- Work on test routine with Trond and Sjur
Per-Eric
- try to find other authors who have smj texts digitaly.
- Nothing done
- Nothing done
- Work with missing list same_dutkama_pgr.txt
- Worked and still working
- Worked and still working
- Work with missing list sameriekta_tjoahkkagæsos.txt
- Nothing done
- Nothing done
- Plan a smj pr tour for our tools
-
fix bugs!
- Nothing done
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
- implement the ped UI and functionality
Sjur
- follow up on sma corpus texts
- nothing
- nothing
- name db/risten.no
- nothing
- nothing
- make an improved sma project plan
- nothing
- nothing
- prepare migration to svn (with Børre, Trond)
- nothing
- nothing
- release hunspell public beta at the end of May (with Børre)
- not really released, but at least prepared for inclusion in the next Debian
- not really released, but at least prepared for inclusion in the next Debian
- update the Changes document
- nothing
- nothing
- follow-up on some Polderland-related bugs: 621, 630, 652, 656
- nothing
- nothing
- InDesign documentation
- nothing
- nothing
- make a test-all target that runs all tests we have
- nothing
- nothing
- define and document testing routines
- nothing since the previous week
- nothing since the previous week
- test sme and smj hunspell lexicons using Gold Standard tests
- done. sme passed, smj failed completely
- done. sme passed, smj failed completely
- give test Linux distro to Petter Reinholdtsen on Wednesday
- done, although not on Wednesday
- done, although not on Wednesday
- look into other distro's as well: Mac, Windows
- looked a bit
- looked a bit
- write README file, it should say BETA
- done
- done
- write installation and usage instructions (online, and in distro)
- done for Debian distro, briefly for OOo manual installation from the same
- done for Debian distro, briefly for OOo manual installation from the same
- order hotel rooms for the Hattfjelldal/board meeting trip
- not done
- not done
- organise meeting room for project board meeting
- not done
- not done
- fix bugs!
Thomas
- look at test cases still not behaving properly
- not any this week
- not any this week
-
fix bugs!
- not any this week
Tomi
- make a hunspell package that suits linux distributions
- not done
- not done
- document how compounding is controlled in the PLX conversion
- not done
- not done
- fix double hyphen bugs
- not done
- not done
- Make a pedagogical speller
- not done
- not done
- fix the remaining hunspell conversion bugs
- not done
- not done
-
fix bugs!
- other
- compiled new speller
- MA delivered
- compiled new speller
Trond
- Help Jovsset with vislcg3 and sma
- Set up Jabber for Lene, Kimme, Saara
- Prepare svn migration (with Sjur, Børre)
- make a test-all target that runs all tests we have
- define and document testing routines
- Dictionaries
- check Hattfjelldal trip with Lene
- fix bugs!.
Pedagogical software online
Meeting memos can be found at
Goal for the Hattfjelldal meeting:
TODO:
- get the content ready (Lene)
- implement the UI and functionality (Saara)
- get an easy-to-remember URL (UiT/IT)
- More thorough skin, layout, ... (External person within the Ped team,
- Make a pedagogical speller (Tomi when finished with his MA thesis)
- Turn off peripheral compounds (numbers, acros, perhaps names)
- Increase editing distance by one for suggestions? Only possible with limited
- Turn off peripheral compounds (numbers, acros, perhaps names)
Documentation
TODO:
- start to reorganise the documentation (Børre, Sjur, Trond)
Corpus gathering
We discussed how the last corpus contacts of Per-Eric could be followed up
TODO:
- follow up on sma corpus texts (Sjur, Jovsset)
- follow-up on the smj texts from Kurt Tore ( Per-Eric)
- other contacts: Nord-Salten avis ( Børge Strandskog), Lena Davidsson
-
Ulf Stefan Winka has a lot of smj texts (Thomas)
- plan a smj pr tour for our tools (Per-Eric, Thomas)
Future plans, directions and ideas
See a separate document in plan/strat/5year.jspwiki.
Infrastructure
To accomodate future enhancements in different directions (in rough order of
- test bench for all parts of our language technology efforts
- migrate to svn
- merge gt, kt and st into one, probably after the svn move
- more modularised make / build infra (prepare for smn, sms, sjd, others)
- close certain parts of the code repository (requires svn)
- set up the Leopard Server features for collaborative support:
- permanent chat rooms
- stored (and indexed) chat transcripts of the chat rooms
- iCal server / group calendars
- wiki
- permanent chat rooms
- wiki? (is part of Leopard Server) or other web-based documentation
- improve Forrest stability and i18n support
- reorganise the documentation content:
- differ between target groups
- get better grouping
- decide what to write in forrest and what in wiki
- update/add missing parts
- differ between target groups
- migrate lexicons to XML, splitting the task
- Name lexica (the Name project)
- Dictionaries (already in XML, task is to integrate them)
- Open POSes (Komi as a test case)
- Name lexica (the Name project)
- change the look of the documentation web
- sfst? Both as replacement for xfst and for hunspell/open-source proofing tools
- investigate the NSIS installer, potentially replacing the InstallShield
- corpus content moved to Max Planck repositories?
TODO:
- make a test-all target that runs all tests we have (Børre, Sjur, Trond)
- define and document testing routines (Børre, Sjur, Trond)
- add Jabber account in iChat
- UiT: Lene, Kimme, Saara (Trond)
- UiT: Lene, Kimme, Saara (Trond)
- prepare migration to svn (Børre, Sjur, Trond)
- https access is now working.
- https access is now working.
- try to repair G5 accounts for iCal Server (Børre)
Linguistics
North Sámi
(nothing new, see proofing bugs below)
Lule Sámi
(nothing new, see proofing bugs below)
TODO:
-
sme->smj lexicon conversion to build bilingual lexicon resources, and
- Add the words when all words are ready.
South Sámi
Jovsset will ask the authors whether we can get a copy of the Verbh
Name lexicon/risten.no infrastructure
TODO:
- fix i18n bug in risten.no/G5 (so they will work without the proper locale
- fix bugs in lexc2xml; add comments to the log element (Saara)
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
- implement data synchronisation between risten.no and
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
- merge placenames which are errouneously in different entries: e.g. Helsinki,
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
Dictionaries
TODO:
- clean up and generalise the make infrastructure
- make Linux and Windows local/integrated versions
- make simple installer applications
- make a public release
- Make a homepage with instructions for dictionary use:
- Clarify the difference between local and online dictionaries:
- Plugin for Firefox and Internet Explorer (online dictionaries)
- Make a homepage with instructions for dictionary use:
Proofing tools
Hunspell
Børre has compiled a new set of lexicons, now also including smj:
sme.dic 50 MB sme.aff 873 kB smj.dic 17 MB smj.aff 484 kB
These files are in 129.242.220.111 /Users/boerre/gt/tmp
Installation (needs the last version of OpenOffice.org 2.4)
Under linux the files se.dic and se.aff must be copied to
In OS X the folder is
Then these lines should be added to the following file:
in Linux:
in Mac:
DICT se NO se DICT se SE se DICT se FI se DICT smj NO smj DICT smj SE smj
Hunspell lexicon distributions:
- Debian tarball (delivered)
- OOo Lingucomponent package (coming)
- zipped lexicon files for manual installation in whatever system the user wants
TODO:
- test sme and smj using Gold Standard tests (Sjur)
- done
- done
- make a proper Linux distro (Børre, Tomi)
- done
- done
- give test Linux distro to Petter Reinholdtsen on Wednesday (Børre, Sjur)
- done on Friday
- done on Friday
- look into other distro's as well: Mac, Windows (Børre, Sjur, ...)
- done a bit, needs more work
- done a bit, needs more work
- write README file, it should say BETA (Børre, Sjur)
- done
- done
- release an OOo distro Tuesday (Børre)
- OOo distro readme and installation (Børre)
- zip distro readme and installation (Børre)
- write installation and usage instructions (online, and in distro)
-
Børre has started, not yet in cvs
-
Børre has started, not yet in cvs
- QA README and installation docs (Trond, Per-Eric)
- not done, depends on the previous task
- not done, depends on the previous task
- translate readme, installation docu (all)
- not done, depends on the previous task
- not done, depends on the previous task
- release a public beta at the end of May (Børre, Sjur)
- released to Debian, still some work for the release to the general audience
- released to Debian, still some work for the release to the general audience
- fix the remaining conversion bugs (Børre, Tomi)
- Børre has worked on the smj problems
Testing
Spelling Error Markup
TODO:
- Set up ways of adding meta-information (source info, used in testing or not,
- test new and nested error markup (Sjur)
Speller bugs
List of bugs returned from Polderland: 621, 630, 652, 656, 676.
Open issues based on test results:
sme
- 426 - comp words from Divvun.no - guoktedássásaš accepted - still OPEN
- 435 - roman numbers - inflection of single letter numbers
- we should pregenerate all numbers once and for all, and store them in a
- we should pregenerate all numbers once and for all, and store them in a
- 595 - prefix+name wihtout hyphen (ovdaLot instead of ovda-Lot) - still
- 600 - gen+hyph compound sámi-dáru - still OPEN
- 603 - suomabealdi accepted - still OPEN
- 606 - speller accepts VUOHTA compound - still OPEN
- 611 - double hyphen sugg still accepted - still OPEN
- 613 - short gen. as second compound part - still OPEN
- 619 - numerals and pronouns to NAMÁK and SASJ fails - still OPEN
- 627 - prefix + hyhpen does not get accepted - FIXED
- 629 - a taking part in compounding without hyphen - still OPEN
- 633 - REGRESSION: double hyphens accepted
- 634 - PropGen+hyph+PropGen - still OPEN
- 641 - numeral+noun compounds - still OPEN
- 642 - noun/adj/proper + hyphen + ain - still OPEN
- 644 - cased numeral+numeral compund - still OPEN
- 646 - adverb + hyphen + noun - still OPEN
- 647 - numerals+NOUN - still OPEN
- 648 - unmotivated suggestions with numeral+noun - still OPEN
- 649 - name + adj compound without hyphen - still OPEN
- 654 - speller does not recognize ordinals on -nuppelogát - still OPEN
- 655 - pron + nai - still OPEN
- 658 - Suggestion saame - still OPEN - won't fix
- 666 - guovtte- and njealje- - FIXED
- 676 - triple-hyphen - FIXED, but double hyphen is still accepted
- other regressions:
-
skuvlajagin now accepted
- skierranis now accepted
-
skuvlajagin now accepted
smj
- 435 - roman number - single letter numbers now recognised
- we should pregenerate all numbers once and for all, and store them in a
- please note that inflection of single letter numerals is fine in
- we should pregenerate all numbers once and for all, and store them in a
- 595 - prefix+name wihtout hyphen (tsåhkeLot instead of tsåhke-Lot) -
- 599 - REGRESSION: numeral attr: s on lot
- 600 - gen+hyph compound sáme-dáro - still OPEN
- 616 - Bispadime-me-ráden - still OPEN, try to find an acro or abbr me
- 619 - numerals and pronouns to NAMÁK and SASJ fails - still OPEN
- 629 - a taking part in compound - still OPEN
- 634 - rop gen + hyphen + Prop gen - still OPEN
- 641 - numeral+noun compounds - still OPEN
- 644 - cased numeral+numeral compund - still OPEN
- 647 - numerals+NOUN - still OPEN
- 648 - unmotivated suggestions with numeral+noun - still OPEN
- 649 - name + adj compound without hyphen - still OPEN
- 650 - noun prefix+name compound without hyphen - still OPEN
- 658 - Suggestion saame - still OPEN, won't fix
- 692 - NEW: - numeral-variants
- other regressions:
- gus NOT accepted anymore
TODO:
- compile new speller lexicons (Tomi)
- document how compounding is controlled in the PLX conversion (Tomi)
Hyphenator bugs
Open issues based on test results :
sme
- 468 - REGRESSION: Márkomeanu
- 547 - REGRESSION: hyphen in front of vowel: Lotnolasealáhusas
- 548 - REGRESSION: mid syllable hyphenation: Háliidivččen
- 549 - REGRESSION: division without hyph: Váccedettiin
- 673 - adj-derivations: guovttenuppelotčoarvvagiin (the word is not rec.)
- 677 - NEW: Wrongly hyphenated ending -danidja - invalid
smj
- 545 - REGRESSION: bad hyphenation in compounds: åhpadusorganisásjåvnån
- 546 - REGRESSION: obligatory hyph rules seem to work in facultative
- 547 - REGRESSION: hyphen in front of vowel: Jienastimnjuolgadusá and
TODO:
- fix hyphenator errors (Tomi)
InDesign tools
Nothing new.
Releases
TODO:
- public hunspell beta - end of May
- Debian distro released, two more coming this week
- Debian distro released, two more coming this week
- update the Changes document (Sjur)
- InDesign documentation (Sjur)
- Norwegian translation received from Davvi Girji
- Norwegian translation received from Davvi Girji
- public 1.1 update of the Polderland-based tools beginning of June
Other
Trip to Hattfjelldal
We have been asked to move the meeting to Friday or Tuesday, but that isn't
TODO:
- check with Lene ( Jovsset, Trond)
- order hotel rooms (Jovsset, Sjur)
- organise meeting room for project board meeting (Sjur)
Corpus contracts + open source
Now decided to wait until we have changed from cvs to svn.
Summer vacations
Who | When |
---|---|
Børre | 30/6-6/7, 21/7-3/8, 11/8-17/8 |
Jovsset | ??? |
Per-Eric | 11/6-30/6 |
Sjur | Mainly in July, dates not set |
Tomi | 16/6 - 4/8 |
Thomas | 23/6 - 21 or 28/7 |
Trond | 30/6 - 18/7, 28/7 - 1/8 |
Divvun feedback
Jovsset was at the Giellagiella seminar last week, with language
Next meeting, closing
The next meeting is 9.6.2008, 9.30 Norwegian time.
The meeting was closed at 11: 00.
Appendix - task lists for the next five days
Boerre
- prepare migration to svn (with Sjur, Trond)
- try to repair G5 accounts for iCal Server
- make a test-all target that runs all tests we have
- define and document testing routines
- look into other hunspell distro's as well: Mac, Windows
- write installation and usage instructions (online, and in distro)
- release an OOo distro Tuesday
- OOo distro readme and installation
- zip distro readme and installation
- fix the remaining hunspell conversion bugs
- fix bugs!
Jovsset
- follow up on sma corpus texts
- Talk to the students at Uppsala University about the Verbh book and the
- check Hattfjelldal trip with Lene
- order hotel rooms for the Hattfjelldal trip
Lene
- get the ped content ready
- Work on test routine with Trond and Sjur
Per-Eric
- follow-up on the smj texts from Kurt Tore ( Per-Eric)
- follow-up contracts from Nord-Salten avis and Lena Davidsson
- Work with missing list same_dutkama_pgr.txt
- Plan a smj pr tour for our tools
- fix bugs!
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
- implement the ped UI and functionality
Sjur
- follow up on sma corpus texts
- name db/risten.no
- make an improved sma project plan
- prepare migration to svn (with Børre, Trond)
- update the Changes document
- follow-up on some Polderland-related bugs: 621, 630, 652
- InDesign documentation
- make a test-all target that runs all tests we have
- define and document testing routines
- look into other distro's as well: Mac, Windows
- order hotel rooms for the Hattfjelldal/board meeting trip
- organise meeting room for project board meeting
- fix bugs!
Thomas
- look at test cases still not behaving properly
- contact Ulf Stefan Winka for smj texts
- fix bugs!
Tomi
- document how compounding is controlled in the PLX conversion
- fix double hyphen bugs
- fix PL hyphenator errors
- Make a pedagogical speller
- fix bugs!
Trond
- Help Jovsset with vislcg3 and sma
- Set up Jabber for Lene, Kimme, Saara
- Prepare svn migration (with Sjur, Børre)
- make a test-all target that runs all tests we have
- define and document testing routines
- Dictionaries
- check Hattfjelldal trip with Lene
- fix bugs!.