Meeting_2008-01-28
Contents:
- Meeting setup
- Agenda
- Opening, agenda review, participants
- Updated task status since last meeting
- Pedagogical software online
- Workshop in Tromsø, end of February
- Documentation
- Corpus gathering
- Infrastructure
- Linguistics
- Name lexicon infrastructure
- Proofing tools
- Other
- Next meeting, closing
- Appendix - task lists for the next five days
Meeting setup
- Date: 28.1.2008
- Time: 09.30 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat/Skype
Agenda
Cf. one of the following, depending on context:
- the upper bar of the SEE window (provided you use the JSPWiki syntax mode)
- the TOC in Forrest-rendered output, like HTML and PDF
Opening, agenda review, participants
Opened at 09: 58.
Present: Børre, Lene, Maaren, Per-Eric, Sjur, Thomas, Tomi, Trond
Absent: none
Agenda accepted as is.
Updated task status since last meeting
Børre
- start to reorganise the documentation
- not done
- not done
- gather sma texts
- not done
- not done
- improve forrest stability with i18n, site look
- set up the Leopard Server features for collaborative support
- not done
- not done
- Hunspell lexicon conversion
- not done
- not done
- InDesign documentation
- not done
- not done
- investigate the NSIS installer
- not done
- not done
- release InDesign tools Jan. 30.
- not done
- not done
-
fix bugs!
- other:
- worked on layouts for giellatekno.uit.no, and the coming oahpa.uit.no sites.
- worked on layouts for giellatekno.uit.no, and the coming oahpa.uit.no sites.
Lene
- Pedagogical project, running till 31.12.08: making grammatical games (VISL)
- VISL-games and quizes are almost ready (ready for trying, some adjustments to
- dialogues: Made a simple technical model, beginning to write the dialogues
- have had course for teachers at one school to get feedback
- made users´ documentation: OAHPA!-portal
- VISL-games and quizes are almost ready (ready for trying, some adjustments to
Maaren
- Put the list of possible sma corpus sources into a document
- not done
- not done
- update the Changes document
- not done
Per-Eric
- check some unusual and missing words from the last Olavi missing list
- Working on it
- Working on it
- keep the contact with Kurt Tores family about his texts, send a new contract
- Sent a new contract, Kurt Tores wife has a contact person who is A Kintel
- Sent a new contract, Kurt Tores wife has a contact person who is A Kintel
- try to visit S T Sandstrøm personally as soon as possible, maybe this week
- Sent a new contract, now she is really positive to give all her text to us
- Sent a new contract, now she is really positive to give all her text to us
- try to find other authors who have smj texts digitally
- nothing done
- nothing done
-
fix bugs!
- Worked some
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
- discuss more parallel texts
Sjur
- document Windows CD installation work-around
- unless we get feedback saying otherwise, the present documenation should be
- unless we get feedback saying otherwise, the present documenation should be
- start to reorganise the documentation
- not done
- not done
- gather sma texts
- not done
- not done
- improve forrest stability with i18n, site look
- not done, but found an i18n issue on the G5/"internal risten.no"
- not done, but found an i18n issue on the G5/"internal risten.no"
- set up the Leopard Server features for collaborative support
- check the present sma sources
- name db/risten.no
- identified the issue with non-working browsers - locale mismatch
- identified the issue with non-working browsers - locale mismatch
- improve hyphenation testing
- done, and several issues identified
- done, and several issues identified
- investigate the NSIS installer
- not done any more
- not done any more
- get hotel rooms in Snåsa
- not done yet, will do today
- not done yet, will do today
- make a first sma project plan
- not done
- not done
- publish corpus contracts and project infra as open-source on NoDaLi-sta
- add verb paradigm generation bug to Bugzilla
- done
- done
- test that hyphenation is identical in InDesign and the command line tool
- done - they seem to be identical, which means we can trust the test results
- done - they seem to be identical, which means we can trust the test results
- release InDesing tools Jan. 30.
-
fix bugs!
- other:
- tested the automatic language identification in Word 2007, after user
- hyphenation and speller testing
- tested the automatic language identification in Word 2007, after user
Thomas
- look at test cases still not behaving properly
- worked some
- worked some
- create hyphenation test data
- done
- done
- release InDesing tools Jan. 30.
- jaså
- jaså
-
fix bugs!
- worked
Tomi
- Hunspell lexicon conversion
- not done
- not done
- document how compounding is controlled in the PLX conversion
- not done
- not done
- release InDesing tools Jan. 30.
- not past this day yet
- not past this day yet
-
fix bugs!
- fixed some
Trond
-
sme->smj lexicon conversion to build bilingual lexicon resources
- Reorganise documentation (with Børre and Sjur)
- Reorganised ped doc, otherwise not
- Reorganised ped doc, otherwise not
- Gather sma texts (with Børre and Sjur)
- Not done
- Not done
- Look at the sma source files (with Sjur)
- Not done
- Not done
- Name lexicon project: Test editing xml files (when they are ready for it)
- No files yet
- No files yet
- Make a first sma project plan
- Looked at it myself, not in plenary
- Looked at it myself, not in plenary
- fix bugs!.
Pedagogical software online
We now have user documentation almost online (the technical one is part of
Børre has been working on setting up a Forrest based site for the user front
The next step is to establish an url, say http://oahpa.no/ or
It should be online on Feb. xx, the slick URL and professional layout should be
TODO:
- Setting up the user documentation with an external address, and
- get an easy-to-remember URL (UiT/IT)
- More thorough skin, layout, ... (External person within the Ped team,
Workshop in Tromsø, end of February
Conference in Tromsø in week 9, february 28-29, on
One of the goals for the conference is to make proposals for grant support.
First draft ready in Snåsa.
TODO:
- Presentation of our work
- Basic tools (Sjur, Trond, Thomas)
- Applications (Lene, Sjur)
- Corpus infrastructure (Børre, Saara, Sjur)
- Overall infrastructure ("Makefile") (Sjur, Tomi)
- Basic tools (Sjur, Trond, Thomas)
- Plans for future work (Sjur, Trond)
- Relevance for other projects
- Standard written language texts (Trond)
- Existing written dialect texts (Lene, Trond)
- Existing dialect recordings (Lene)
- Standard written language texts (Trond)
- Turn the text into slides (samdoc08.tex into samdoc08-sem.tex (Trond)
Documentation
TODO:
- start to reorganise the documentation (Børre, Sjur, Trond)
Corpus gathering
TODO:
- follow-up on the smj texts from Kurt Tore ( Per-Eric)
- the text discussions will go via Anders Kintel
- the text discussions will go via Anders Kintel
- go visit Sigga Tuolja Sandstrøm (Per-Eric)
- no need to go there, Per-Eric called her and sent her a contract
- need to talk to a person who has scanned the texts, will get the texts from
- no need to go there, Per-Eric called her and sent her a contract
- gather sma texts (Børre, Sjur, Trond)
- Put the list of possible corpus sources into a document
Infrastructure
TODO:
- add Jabber account in iChat (all)
- improve forrest stability with i18n, site look (Børre, Sjur, Tomi)
- set up the Leopard Server features for collaborative support - permanent chat
Linguistics
North Sámi
Hyphenation bugs still there, now properly documented by the improved test
Lule Sámi
Hyphenation: same as for sme.
TODO:
-
sme->smj lexicon conversion to build bilingual lexicon resources, and
- Add the words when all words are ready.
South Sámi
TODO:
- check the present sources (Sjur, Trond)
Name lexicon infrastructure
The upcoming dictionaries are:
- kven: fkvnob, nobfkv
- smesmj
- (smjsme)
- smenob
The kven work should be visible. The smj should be reported this week, the smenob is part of the ped work and is interesting for the general audience.
Status quo: The dictionaries are shown online
Short-term goal: Have them work in risten GUI.
Decisions made in Tromsø can be found in this meeting memo.
TODO:
- fix i18n bug in risten.no/G5 (so they will work without the proper locale
- fix display in column 3 (Sjur)
- fix bugs in lexc2xml; add comments to the log element (Saara)
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
- implement data synchronisation between risten.no and
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
- merge placenames which are errouneously in different entries: e.g. Helsinki,
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
Proofing tools
Hunspell
The %> marker does not survive into Hunspell to work as a boundary marker,
Priority list:
- debug the missing > marker
- add smj to the soup, make sure it works roughly as good as sme
- fix the remaining conversion bugs for sme
- return to smj, and fix whatever is left to fix
- integrate the derivations as separate "continuation lexicons"
TODO:
- Hunspell lexicon conversion (Tomi, Børre)
- debug %> problem (Tomi)
Testing
Spelling Error Markup
TODO:
- Set up ways of adding meta-information (source info, used in testing or not,
- test new and nested error markup (Sjur)
Automated testing
TODO:
- improve hyphenation testing (Sjur)
- done
- done
- add verb paradigm generation bug to Bugzilla (Sjur)
- done
Lexicon conversion to the PLX format
Open issues based on test results :
sme
- 425 - roman number - will not be fixed in 1.0 release
- 426 - comp words from Divvun.no - guoktedássásaš accepted - still open
- 536 - speller accepts "impossible" compound-forms, geažideapmigárvu and
- 593 - missing words in beta2, still missing Nuppelohkái - not lexicalized
- 595 - prefix+name wihtout hyphen (ovdaLot instead of ovda-Lot)
- 597 - does not recognize nubbelohki - not lexicalized
- 603 - suomabealdi, norggabealdi accepted
- 606 - speller accepts VUOHTA compound
- 611 - double hyphen sugg still accepted
- 613 - short gen. as second compound part
- 625 - word+footnote - possibly Polderland or MS, or a consequence of allowing
- 627 - prefix + hyhpen does not get accepted
- 629 - a taking part in compounding without hyphen
- 631 - numbers starting with 0
- 633 - double hyphens accepted in Word, not by cmdline speller
- 634 - PropGen+hyph+PropGen
- 637 - nai(go) becomes -naj(go)
smj
- 482 - Nuorttalijguovlojn accepted again
- testcase changed, test PASSED
- testcase changed, test PASSED
- 607 - acro + hyphen, NRKGA accepted - still OPEN
- 615 - actio and actor compounds - FIXED
- 616 - Bispadime-me-ráden - still OPEN
- 618 - dipht. simpl. - FIXED
- 629 - a taking part in compound - still OPEN
- 631 - number compounds starting with 0
- 634 - rop gen + hyphen + Prop gen
TODO:
- look at test cases still not behaving properly (Thomas, Tomi)
- document how compounding is controlled in the PLX conversion (Tomi)
InDesign tools
The speller works in InDesign and InCopy. Lacks user defined lexicons, but
The new Sámi newspaper, Ávvir, is publishing its first edition on February
TODO:
- improve hyphenation testing (Sjur)
- done
- done
- test that the hyphenation is identical in InDesign and the command line
- done
- done
- test twolc # bug solution (Tomi, Trond, Sjur)
- fix double hyphen bugs (Tomi)
- new lexicons by Tuesday (Tomi)
- updated Polderland tools by Wednesday (Sjur)
- final changes and bug fixes by Thursday afternoon (Thomas, Sjur, Tomi)
- final lexicons by Friday morning (Tomi)
Hyphenators
We need more test data, to test hyphenation of different types of words.
compoundword com^pound#word
It should contain all possible word formation patterns and their correct
- compounds
- derivations
- names
- misspellings
- compounds with acros, numbers, names, etc.
TODO:
- create hyphenation test data (Thomas)
- done
- also done: used it in testing, bugs discovered
- done
Windows installer
TODO:
- investigate the NSIS installer (Børre, Sjur)
Releases
TODO:
- update the Changes document (Maaren)
- release InDesing tools Jan. 30. (Børre, Sjur, Thomas, Tomi)
- compile new lexicons (Tomi)
- test (all)
- document (Sjur)
- package and release (Sjur)
- compile new lexicons (Tomi)
Other
South Sámi project startup meeting
- in Snåsa
- 11th - 15th of Feb, kick-off meeting Wednesday 13.
- Participants: SD (incl. Divvun), Nord-Trøndelag fylkeskommune, Snåsa kommune,
We extend the meeting on our part, to have this project's first gathering.
TODO:
- get hotel rooms (Sjur)
- make a first sma project plan (Sjur, Trond)
Corpus contracts + open source
TODO:
- publish corpus contracts and project infra as open-source on NoDaLi-sta
Bug fixing
When fixing bugs, record the version number containing the fix in the Bugzilla
83 open Divvun/Disamb bugs (45 of these 83 are speller-related bugs,
Next meeting, closing
The next meeting is 4.2.2008, 09: 30 Norwegian time.
The meeting was closed at 11: 28.
Appendix - task lists for the next five days
Boerre
- start to reorganise the documentation
- gather sma texts
- improve forrest stability with i18n, site look
- set up the Leopard Server features for collaborative support
- Hunspell lexicon conversion
- InDesign documentation
- investigate the NSIS installer
- release InDesing tools Jan. 30.
- work on Tromsø Sami workshop paper
- fix bugs!
Lene
- Ped project
- work on Tromsø Sami workshop paper
Maaren
- Put the list of possible sma corpus sources into a document
- update the Changes document
Per-Eric
- check some unusual and missing words from the last Olavi missing list
- keep the contact with Kurt Tores family about his texts.
- try to find other authors who have smj texts digitaly
- fix bugs!
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
- discuss more parallel texts
Sjur
- start to reorganise the documentation
- gather sma texts
- improve forrest stability with i18n, site look
- set up the Leopard Server features for collaborative support
- check the present sma sources
- name db/risten.no
- investigate the NSIS installer
- get hotel rooms in Snåsa
- make a first sma project plan
- publish corpus contracts and project infra as open-source on NoDaLi-sta
- release InDesing tools Jan. 30.
- work on Tromsø Sami workshop paper
- updated Polderland tools by Wednesday
- final changes and bug fixes by Thursday afternoon
- fix bugs!
Thomas
- look at test cases still not behaving properly
- release InDesing tools Jan. 30.
- work on Tromsø Sami workshop paper
- final changes and bug fixes by Thursday afternoon
- fix bugs!
Tomi
- Hunspell lexicon conversion
- document how compounding is controlled in the PLX conversion
- release InDesing tools Jan. 30.
- work on Tromsø Sami workshop paper
- debug %> problem in Hunspell conversion
- fix double hyphen bugs
- new lexicons by Tuesday
- final changes and bug fixes by Thursday afternoon
- final lexicons by Friday morning
- fix bugs!
Trond
- Report the smesmj project
- Start working on the samdoc talk
-
sme->smj lexicon conversion to build bilingual lexicon resources
- Reorganise documentation (with Børre and Sjur)
- Gather sma texts (with Børre and Sjur)
- Look at the sma source files (with Sjur)
- Name lexicon project: Test editing xml files (when they are ready for it)
- Make a first sma project plan
- work on Tromsø Sami workshop paper
- fix bugs!.