Meeting_2008-03-12
Contents:
Meeting setup
- Date: 12.3.2008
- Time: 09.30 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat/Skype
Agenda
Cf. one of the following, depending on context:
- the upper bar of the SEE window (provided you use the JSPWiki syntax mode)
- the TOC in Forrest-rendered output, like HTML and PDF
Opening, agenda review, participants
Opened at 10: 30.
Present: Børre, Per-Eric, Sjur, Thomas, Tomi
Absent: Maaren, Trond
Agenda accepted as is.
Updated task status since last meeting
Børre
- start to reorganise the documentation
- gather sma texts
- improve forrest stability with i18n, site look
- set up the Leopard Server features for collaborative support
- Hunspell lexicon conversion
- InDesign documentation
- investigate the NSIS installer
- give contract with blank fields to Per-Eric
- fix bugs!
Lene
- Ped project
Maaren
- Put the list of possible sma corpus sources into a document
- update the Changes document
Per-Eric
- keep the contact with Kurt Tores family about his texts.
- Nothing
- Nothing
- try to find other authors who have smj texts digitaly, send contracts to them
- Working on it, find some
- Working on it, find some
- Work with a few things missing from the bible texts.
- Worked and still working
- Worked and still working
-
fix bugs!
- Nothing done
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
- discuss more parallel texts
Sjur
- start to reorganise the documentation
- nothing
- nothing
- gather sma texts
- presented the project in Trondheim, will get some text as a result
- presented the project in Trondheim, will get some text as a result
- improve forrest stability with i18n, site look
- nothing
- nothing
- set up the Leopard Server features for collaborative support
- nothing
- nothing
- name db/risten.no
- nothing useful
- nothing useful
- investigate the NSIS installer
- nothing
- nothing
- make a first sma project plan
- nothing new
- nothing new
- publish corpus contracts and project infra as open-source on NoDaLi-sta
- nothing
- nothing
- call Julie Eira about Maaren's share of the Divvun project
- done. Maaren can do as much as she finds time for, and as much as we need her
- done. Maaren can do as much as she finds time for, and as much as we need her
- prepare Trondheim presentation
- done
- done
- fix bugs!
Thomas
- look at test cases still not behaving properly
- not much done
- not much done
-
fix bugs!
- worked some
Tomi
- Hunspell lexicon conversion
- not done
- not done
- document how compounding is controlled in the PLX conversion
- not done
- not done
- fix double hyphen bugs
- not done
- not done
-
fix bugs!
- not done
- not done
- other
- worked with XML processing
- initial work for making sfst transducer(s)
- worked with XML processing
Trond
- Reorganise documentation (with Børre and Sjur)
- Fiddling around myself, awaiting a plenary discussion.
- Fiddling around myself, awaiting a plenary discussion.
- Gather sma texts (with Børre and Sjur)
- Not done
- Not done
- Name lexicon project: Test editing xml files (when they are ready for it)
- Waiting
- Waiting
- Work on sma analyser and visl integration
- Huge progress here. Started including the Errata sheet from Verbh, still
- Huge progress here. Started including the Errata sheet from Verbh, still
- fix bugs!.
Pedagogical software online
TODO:
- get an easy-to-remember URL (UiT/IT)
- More thorough skin, layout, ... (External person within the Ped team,
Documentation
TODO:
- start to reorganise the documentation (Børre, Sjur, Trond)
Corpus gathering
The Trondheim visit and presentation has brought both smj and sma texts.
TODO:
- follow-up on the smj texts from Kurt Tore ( Per-Eric)
- get texts from Sigga Tuolja Sandstrøm, possibly through Olavi Korhonen
- other contacts: Nord-Salten avis, Børge Strandskog, Lena Davidsson daughter to
- gather sma texts (Børre, Sjur, Trond, Joseph)
-
Sjur got some important contacts in Trondheim
-
Sjur got some important contacts in Trondheim
- Put the list of possible corpus sources into a document
- give contract with blank fields to Per-Eric ( Børre)
Infrastructure
TODO:
- add Jabber account in iChat (all)
- improve forrest stability with i18n, site look (Børre, Sjur, Tomi)
- set up the Leopard Server features for collaborative support - permanent chat
Linguistics
North Sámi
Hyphenation bugs still there, now properly documented by the improved test
Lule Sámi
All missing lists now added. A few things missing from the bible texts.
Hyphenation: same as for sme.
TODO:
-
sme->smj lexicon conversion to build bilingual lexicon resources, and
- Add the words when all words are ready.
South Sámi
Tomi has added even more verbs. The corrections to Verbh are now added
Name lexicon infrastructure
TODO:
- fix i18n bug in risten.no/G5 (so they will work without the proper locale
- it works ok locally, set-up / config needs to be checked on the G5; probably
- it works the same both locally and on the G5, relates to i18n setup in
- it works the same both locally and on the G5, relates to i18n setup in
- it works ok locally, set-up / config needs to be checked on the G5; probably
- fix bugs in lexc2xml; add comments to the log element (Saara)
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
- implement data synchronisation between risten.no and
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
- merge placenames which are errouneously in different entries: e.g. Helsinki,
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
Proofing tools
Hunspell
TODO:
- add smj to the soup, make sure it works roughly as good as sme
- fix the remaining conversion bugs for sme
- return to smj, and fix whatever is left to fix
- integrate the derivations as separate "continuation lexicons"
Testing
Spelling Error Markup
TODO:
- Set up ways of adding meta-information (source info, used in testing or not,
- test new and nested error markup (Sjur)
Speller bugs
Open issues based on test results :
sme
- 426 - comp words from Divvun.no - guoktedássásaš accepted - still open
- 435 - roman number - single letter numbers now recognised
- we should pregenerate all numbers once and for all, and store them in a
- we should pregenerate all numbers once and for all, and store them in a
- 595 - prefix+name wihtout hyphen (ovdaLot instead of ovda-Lot)
- 600 - REGRESSION: gen+hyph compound sámi-dáru
- 603 - suomabealdi, norggabealdi accepted
- 606 - speller accepts VUOHTA compound
- 607 - acro + hyphen
-
NRKGA is acro + clitic accepted without colon - what is correct?
-
NRKGA is acro + clitic accepted without colon - what is correct?
- 611 - double hyphen sugg still accepted
- 613 - short gen. as second compound part
- 619 - numerals and pronouns to NAMÁK and SASJ fails
- 627 - prefix + hyhpen does not get accepted
- 629 - a taking part in compounding without hyphen
- 633 - double hyphens accepted in Word, not by cmdline speller
- 634 - PropGen+hyph+PropGen
- 641 - numeral+noun compounds
- 642 - noun/adj/proper + hyphen + ain
- 644 - cased numeral+numeral compund
- 646 - adverb + hyphen + noun
- 647 - numerals+NOUN
- 648 - unmotivated suggestions with numeral+noun
- 649 - name + adj compound without hyphen
- 654 - speller does not recognize ordinals on -nuppelogát
- 655 - pron + nai
- 658 - Suggestion saame
- 660 - abbr. not recognised
smj
- 435 - roman number - single letter numbers now recognised
- we should pregenerate all numbers once and for all, and store them in a
- we should pregenerate all numbers once and for all, and store them in a
- 595 - prefix+name wihtout hyphen (tsåhkeLot instead of tsåhke-Lot)
- 600 - REGRESSION: gen+hyph compound sáme-dáro
- 607 - acro + hyphen
-
NRKGA is acro + clitic accepted without colon - what is correct?
-
NRKGA is acro + clitic accepted without colon - what is correct?
- 616 - Bispadime-me-ráden - still OPEN, try to find an acro or abbr me
- 619 - numerals and pronouns to NAMÁK and SASJ fails - still OPEN
- 629 - a taking part in compound - still OPEN
- 634 - rop gen + hyphen + Prop gen - still OPEN
- 641 - numeral+noun compounds - still OPEN
- 644 - cased numeral+numeral compund
- 647 - numerals+NOUN
- 648 - unmotivated suggestions with numeral+noun
- 649 - name + adj compound without hyphen
- 650 - noun prefix+name compound without hyphen
- 658 - Suggestion saame
TODO:
- look at test cases still not behaving properly (Thomas, Tomi)
- document how compounding is controlled in the PLX conversion (Tomi)
Hyphenator bugs
Open issues based on test results :
sme
- 468 - Márkomeanu -> Polderland - FIXED
- 548 - duostan -> Polderland - FIXED
- 549 - missing hyph at word boundary -> Polderland - FIXED
- 633 - extra hyphen inserted -> Divvun - FIXED
There are still some bugs found in the wordtypes test file. They should be added
TODO:
- add remaining hyphenation bugs to Bugzilla (Thomas)
smj
- 549 - missing hyph at word boundary -> Polderland - FIXED
- 633 - extra hyphen inserted -> Polderland - FIXED
- 636 - hyphen before last char -> Divvun
The wordtypes test file does contain another problem, but that one belongs to
TODO:
- lexicalise europarádeministarjuogos ( Thomas)
- try to fix 636 (Thomas)
InDesign tools
We're waiting for an update from Polderland.
Windows installer
TODO:
- investigate the NSIS installer (Børre, Sjur)
Releases
TODO:
- update the Changes document (Maaren)
- documentation (Sjur)
- Norwegian translation received from Davvi Girji
Other
Corpus contracts + open source
TODO:
- publish corpus contracts and project infra as open-source on NoDaLi-sta
Easter vacation
Please report all real vacation days (as opposed to taking extra hours off)
Name | Period |
---|---|
Børre | 10.-16.3. (winter holidays) |
Sjur | 17.-24.3 |
Thomas | 17.-24.3 |
Jovsset | 17.-24.3. |
Maaren | 19.-24.3. |
Per-Eric | 25.-28.3. |
Tomi | nothing special, writing on the thesis |
Next meeting, closing
The next meeting is 25.3.2008.
The meeting was closed at 11: 29.
Appendix - task lists for the next five days
Boerre
- start to reorganise the documentation
- gather sma texts
- improve forrest stability with i18n, site look
- set up the Leopard Server features for collaborative support
- Hunspell lexicon conversion
- InDesign documentation
- investigate the NSIS installer
- give contract with blank fields to Per-Eric
- fix bugs!
Lene
- Ped project
Maaren
- Put the list of possible sma corpus sources into a document
- update the Changes document
Per-Eric
- keep the contact with Kurt Tores family about his texts.
- try to find other authors who have smj texts digitaly, send contracts to them
- Work with missing list from the bible texts.
- Keep the contact with Ulf-Stefan Winka who has many more smj texts to add.
Saara
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
- discuss more parallel texts
Sjur
- start to reorganise the documentation
- gather sma texts
- improve forrest stability with i18n, site look
- set up the Leopard Server features for collaborative support
- name db/risten.no
- investigate the NSIS installer
- make a first sma project plan
- publish corpus contracts and project infra as open-source on NoDaLi-sta
- fix bugs!
Thomas
- look at test cases still not behaving properly
- add remaining hyphenation bugs to Bugzilla
- lexicalise europarádeministarjuogos
- try to fix 636
- fix bugs!
Tomi
- Hunspell lexicon conversion
- document how compounding is controlled in the PLX conversion
- fix double hyphen bugs
- fix bugs!
Trond
- Reorganise documentation (with Børre and Sjur)
- Gather sma texts (with Børre and Sjur)
- Name lexicon project: Test editing xml files (when they are ready for it)
- Work on sma analyser and visl integration
- fix bugs!.