Meeting_2008-01-07
Contents:
- Meeting setup
- Agenda
- 1. Opening, agenda review, participants
- 2. Updated task status since last meeting
- 3. Documentation
- 4. Corpus gathering
- 5. Infrastructure
- 6. Linguistics
- 7. Name lexicon infrastructure
- 8. Proofing tools
- 9. Other
- 10. Summary, priority list going forward
- 11. Next meeting, closing
- Appendix - task lists for the next week
Meeting setup
- Date: 7.1.2008
- Time: 09.30 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat/Skype
Agenda
Cf. one of the following, depending on context:
- the upper bar of the SEE window (provided you use the JSPWiki syntax mode)
- the TOC in Forrest-rendered output, like HTML and PDF
1. Opening, agenda review, participants
Opened at 10: 19.
Present: Børre, Per-Eric, Sjur, Tomi, Trond
Absent: Maaren, Risten, Thomas
Agenda accepted as is.
2. Updated task status since last meeting
Børre
- finalise InDesign hyphenator
- Done
- Done
- fix bugs!
Ilona
- Continue the bug 494
- done
- done
- Still something to translate/proofread in Finnish?
- done
Maaren
- lexicalise actio compounds
Per-Eric
- check some unusual and missing words from the last Olavi missing list
- Worked some
- Worked some
- proofread the translated/written documentation
- Not done
- Not done
-
fix bugs!
- Fixed some
Risten
- Print 50 CDs, take them to Oslo as backup
- done
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
- add nested error markup to xml conversion
- discuss more parallel texts
Sjur
- document Windows CD installation work-around
- don't remember, will have to check this
- don't remember, will have to check this
- finalise InDesign hyphenator
- done, released 21.12.2007
- done, released 21.12.2007
- update usage and installation documentation
- done
- done
- new/updated front page (old front page to history page)
- done
- done
- press release
- done
- done
- fix bugs!
Thomas
- translate InDesign documentation
- done
- done
-
sme->smj lexicon conversion to build bilingual lexicon resources
- test hyphenation
- done
- done
- analyse hyphenation test results
- done
- done
- look at test cases still not behaving properly
- fix bugs!
Tomi
- Hunspell lexicon conversion
- not done
- not done
-
fix bugs!
- done
Trond
-
sme->smj lexicon conversion to build bilingual lexicon resources
- Working on it.
- Working on it.
- fix bugs!.
3. Documentation
InDesign was released just before Christmas (21.12).
Our documentation needs a thorough clean-up and reorganisation to make it
TODO:
- update InDesign documentation (Børre, Sjur)
- done
- done
- translate InDesign documentation (Thomas, Sjur, Ilona)
- done
- done
- proofread the translated/written documentation (Børre, Per-Eric, Tomi)
- done
- done
- start to reorganise the documentation (Børre, Sjur, Trond)
4. Corpus gathering
Per-Eric has been talking with the wife of Kurt Tore, and she has
We need to start gathering sma texts right away. Some sources of sma
-
Nord-Trøndelag fylke
-
Snåsa kommune
- Anna Jacobsen Don jih dan bijre I, II. This should be scanned.
- Lohkede Saemien, needs to be scanned
- other authours
- SD texts, specifically on Pia's SD computer, teaching text books
- Bible
- Year book
- Sámi school books
-
sma Davvin book(s?)
- some univ. texts: theses, student papers, teaching texts, Ove Lorentz does
- «Reindriftsbladet» and «Š» do publish some texts in sma
- Infonuorra, Samenet
TODO:
- follow-up on the smj texts from Kurt Tore ( Per-Eric)
- check who signed the corpus contracts with Kurt Tore, and when (Børre)
- gather sma texts (Børre, Sjur, Trond)
5. Infrastructure
Forrest needs better handling of i18n, to help us get a more stable site. We
We now also have the time to start to explore the new collaboration features of
TODO:
- add Jabber account in iChat (all)
- improve forrest stability with i18n, site look (Børre, Sjur, Tomi)
- set up the Leopard Server features for collaborative support - permanent chat
6. Linguistics
North Sámi
Hyphenation bugs still there, needs improved test bench.
Lule Sámi
Hyphenation: same as for sme.
TODO:
-
sme->smj lexicon conversion to build bilingual lexicon resources, and
South Sámi
Trond and Sjur needs to have a thourough look at the sma sources,
TODO:
- check the present sources (Sjur, Trond)
7. Name lexicon infrastructure
Decisions made in Tromsø can be found in this meeting memo.
TODO:
- fix bugs in lexc2xml; add comments to the log element (Saara)
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
- implement data synchronisation between risten.no and
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
- merge placenames which are errouneously in different entries: e.g. Helsinki,
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
8. Proofing tools
Hunspell
Proper nouns not yet working, and they do not contain anything to clearly
TODO:
- Hunspell lexicon conversion (Tomi, Børre)
Testing
Spelling Error Markup
This will wait till after the release.
TODO:
- Set up ways of adding meta-information (source info, used in testing or not,
- move Steinar's error markup in the xml files to (a copy of) the original
- add nested error markup to xml conversion (Saara)
- test new and nested error markup (Sjur)
Automated testing
Paradigm testing is now fixed, and is working.
BUT: paradigms are not generated for smj verbs, in gt/smj/testing.
TODO:
- improve hyphenation testing (Sjur)
- fix paradigm testing (Sjur)
- fixed
Lexicon conversion to the PLX format
Open issues based on test results:
sámi-dáru - not accepted => Gen+hyph compound, is not allowed with hyphen. We
smj
- 518 - regression - Fuoskok = pl+clitic as well as derivation = won't fix
- 596 - C-giella - fixed
- 599 - numeral attr: s on lot (ok in lexc)
- 607 - acro + hyphen
- 615 - actio and actor compounds
- 617 - propnoun compounding
- 618 - dipht. simpl.
- 619 - num. derivation
sme
- 425 - roman number - will not be fixed in 1.0 release
- 518 - regression - plural same as derivation, won't fix
- 542 - clitic -ge
- 588 - regression - r. accepted as final part
-
Tomi knows where is the problem, but because verb compilation takes much
-
Tomi knows where is the problem, but because verb compilation takes much
- 593 - missing words in beta2, still a few
- 595 - prefix+name wihtout hyphen (ovdaLot instead of ovda-Lot)
- 597 - does not recognize nubbelohki
- 599 - short attr -lot/-låk numerals (ok in smX.fst)
- 603 - suomabealdi, norggabealdi accepted
- 604 - action as second compound part
- 606 - speller accepts VUOHTA compound
- 609 - Anár-julggaštusa not recognized
- 610 - missing duhát words
- 611 - double hyphen sugg (missing test case)
- 613 - short gen. as second compound part
- 619 - numeral derivations
TODO:
- look at test cases still not behaving properly (Thomas, Tomi)
InDesign tools
Hyphenator released! The speller is coming in a first beta today or tomorrow.
TODO:
- improve hyphenation testing (Sjur)
Hyphenators
Still issues to investigate.
Update
Released dec. 21.
Windows installer
New Windows installer NSIS
Benefits of using this:
- open source
- free / no cost
- can make installer packages on Mac/Linux
- truly multilingual
- we have full control over the installer sources
Drawbacks:
- untested
- extra work on our part
TODO:
- investigate the NSIS installer (Børre, Sjur)
9. Other
South Sámi project startup meeting
- in Snåsa
- end of January (no dates given yet)
- Participants: SD (incl. Divvun), Nord-Trøndelag fylkeskommune, Snåsa kommune, UiT, "resource persons"
We extend the meeting on our part, to have this project's first gathering.
TODO:
- get the date straight (Sjur)
- get hotel rooms (Sjur)
- make a first sma project plan (Sjur, Trond)
Corpus contracts + open source
TODO:
- publish corpus contracts and project infra as open-source on NoDaLi-sta
Bug fixing
When fixing bugs, record the version number containing the fix in the Bugzilla
83 open Divvun/Disamb bugs (45 of these 83 are speller-related bugs,
10. Summary, priority list going forward
- new Windows installer, based in NSIS?
- open-source announcement
- sma project setup/startup
- hyphenation bug hunting
- speller bug fixes
- smj corpus hunting/expansion
- name lexicon, risten.no
- forrest improvements re i18n and reliability/stability
11. Next meeting, closing
The next meeting is 14.1.2008, 09: 30 Norwegian time.
The meeting was closed at 12: 30.
Appendix - task lists for the next week
Boerre
- start to reorganise the documentation
- check who signed the corpus contracts with Kurt Tore, and when
- gather sma texts
- improve forrest stability with i18n, site look
- set up the Leopard Server features for collaborative support
- Hunspell lexicon conversion
- move Steinar's error markup in the xml files to (a copy of) the original
- InDesign documentation
- investigate the NSIS installer
- fix bugs!
Maaren
- lexicalise actio compounds
Per-Eric
- check some unusual and missing words from the last Olavi missing list
- Corpus gathering, keep the contact with Kurt Tores family about his texts
- Corpus gathering, try to call Sigga Tuolja Sandstrøm again, about her texts
- fix bugs!
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
- add nested error markup to xml conversion
- discuss more parallel texts
Sjur
- document Windows CD installation work-around
- start to reorganise the documentation
- gather sma texts
- improve forrest stability with i18n, site look
- set up the Leopard Server features for collaborative support
- check the present sma sources
- name db/risten.no
- improve hyphenation testing
- investigate the NSIS installer
- get the date straight for the Snåsa meeting
- get hotel rooms in Snåsa
- make a first sma project plan
- publish corpus contracts and project infra as open-source on NoDaLi-sta
- fix bugs!
Thomas
-
sme->smj lexicon conversion to build bilingual lexicon resources
- look at test cases still not behaving properly
- fix bugs!
Tomi
- Hunspell lexicon conversion
- fix bugs!
Trond
-
sme->smj lexicon conversion to build bilingual lexicon resources
- Reorganise documentation (with Børre and Sjur)
- Gather sma texts (with Børre and Sjur)
- Look at the sma source files (with Sjur)
- Name lexicon project: Test editing xml files (when they are ready for it)
- Make a first sma project plan
- fix bugs!.