Meeting_2009-10-12
Contents:
- Meeting setup
- Agenda
- Opening, agenda review, participants
- Updated task status since last meeting
- Oahpa!
- Corpus gathering
- Promoting Divvun
- Future plans, directions and ideas
- Infrastructure
- Linguistics
- Name lexicon/risten.no infrastructure
- Proofing tools
- Other
- Next meeting, closing
- Appendix - task lists for the next five days
Meeting setup
- Date: 12.10.2009
- Time: 09.30 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat
Agenda
Cf. one of the following, depending on context:
- the upper bar of the SEE window (provided you use the JSPWiki syntax mode)
- the TOC in Forrest-rendered output, like HTML and PDF
Opening, agenda review, participants
Opened at 10: 18.
Present:
Absent: none
Agenda accepted as is.
Updated task status since last meeting
Børre
- static i18n divvun site
- still working
- still working
- order Snow Leopard for the XServe
- not done
- not done
- phase out the G5, use the XServe as our web server instead
- not done
- not done
- upgrade the XServe to Snow Leopard Server
- not done
- not done
- improve XSL script to transform leaflet Forrest XDocs to an OOo Draw document
- not done
- not done
-
gt/Makefile remake
- not done
- not done
- get translations of thank-you letter
- not done
- not done
-
fix bugs!
- other:
- Fixed contracts with Roald E Kristiansen for books published by
- Fixed a perl script for Trond
- Fetched Snow Leopards for the Tromsø workers, battery for Thomas and
- Fixed contracts with Roald E Kristiansen for books published by
Ciprian
- MT: test Autshumato Alpha version on Windows using Launcher
- todo
- todo
- MT: make Autshumato run on Mac
- ongoing
- ongoing
- make eXist running with the SLT-DB
- todo
- todo
- make a schema/dtd description of the lexC-file (experiment with
- todo
- todo
- transform sme-lexC files into XML format
- todo
- todo
- GT web:
- coloring output of the disambiguation
- todo
- todo
- add a tree visualizer for the dependency trees
- todo
- todo
- coloring output of the disambiguation
- Bodø Oahpa:
- generalize for semantic classes and pos
- todo
- todo
- implement a login mechanism
- todo
- todo
- implement a upload data mechanism
- todo
- todo
- setup the Bodø version on victorio
- todo
- todo
- generalize for semantic classes and pos
- Sandbox Oahpa:
- setup the Sandbox version on victorio
- setup the Sandbox version on victorio
- Running Oahpa:
- debug <only-pl/> and <no-leksa/> features (reported by Lene)
- done
- done
- add an Oahpa clock and date excercise (cf. Numra)
- todo
- todo
- Generate fin/xml/{nouns|verbs|adjectives}.xml, and implement the new Leksa
- todo
- todo
- debug <only-pl/> and <no-leksa/> features (reported by Lene)
- KomEngFin:
- compile only one version of dictionary and use css for user preferences
- todo
- todo
- generate the first StarDict version of komeng and komfin dictionary
- todo
- todo
- test the automatic sorting by Komi alphabet in xsl (as discussed
- todo
- todo
- compile only one version of dictionary and use css for user preferences
- SmeNob:
- the StarDict on Windows: check the XDXF-plugin
- done (another format is perhaps needed for StarDict on Windows)
- done (another format is perhaps needed for StarDict on Windows)
- try to reduce the dict-size on mac: experiment with xPointer, etc.
- todo
- todo
- incorporate the passives into the last version of the sme: nob
- todo
- todo
- the StarDict on Windows: check the XDXF-plugin
- SmaNobSwe:
- extend the smanobswe dictionary
- check all bug in the sma: nob-swe dictionary
- fix the bug with the string length in the popup window (sma: nob)
- todo
- todo
- extend the smanobswe dictionary
- SmjNob:
- start working at a Lule-Sami based on Bruce Moren's data
- ongoing
- ongoing
- start working at a Lule-Sami based on Bruce Moren's data
- SjdRus:
- continue the work at the Kildin-Russian dictionary, next internal deadline is
- ongoing
- ongoing
- continue the work at the Kildin-Russian dictionary, next internal deadline is
- Corpus:
- check the processing of new corpus documents (error logging during
- todo
- check the processing of new corpus documents (error logging during
Maja
- Prepare text´s about normaivity issue to SGL/SGM
- done, has to finnish the loanwords : -O
- done, has to finnish the loanwords : -O
- more work on sma adjectives
- done
- done
- contact Turi Aleksandersen about book signature follow-up from Joseph
- will do it this week
- will do it this week
- contact authors of sma aticles in 9-10 eds. of the yearbook
- to do
- to do
- get the south-sámi articles in "samisk skolehistorie" Bi
- work with missing lists- southsámi
- not done
Sjur
- @Barents: plan a meeting/seminar with potential cooperation partners
- @Barents: plan for The Real Thing
- @TTS: start public tender process
- make Leif Åge send out CD's to distribution points
- start Nordplus Sprog project
- send eXist log files to Ciprian
- finalise formal letter to Laila Mattsson-Magga with Maja
- Write a formal letter to Davvi girji about electronic dictionaries
- package and deliver sma proofing tools (with Tomi)
- make XSL script to transform leaflet Forrest XDocs to an OOo Drawer document
- continue to write a proofing API & implementation specification
- new meeting with Krister and Måns - the work is progressing
- new meeting with Krister and Måns - the work is progressing
- name db/risten.no
- follow-up on some Polderland-related bugs: 621, 630, 652
- support and maintenance contract for sme and smj, MS+Adobe tools
- Sámi languages as part of Norsk språkbank
- set up risten.no on eXist/XServe (as a beta version site)
- set up required infra for smenob on risten.no/XServe
- find and contact the correct person in SD, to get the manuscript for all Sámi
-
fix bugs!
- other things:
- major headache with the Nordic social security regulations solved - finally
- major headache with the Nordic social security regulations solved - finally
Thomas
- prepare text´s about normativity issue to SGL/SGM
- worked
- worked
-
sma adjectives
- worked
- worked
- Digitalize south saami books
- worked
- worked
- finish reformulating the proper noun grammar like the verbs
- not yet
- not yet
-
fix bugs!
- worked
Tomi
- package and deliver sma proofing tools (with Sjur)
- not done
- not done
- document how compounding is controlled in the PLX conversion
- not done
- not done
- fix double hyphen bugs
- not done
- not done
- fix PL hyphenator bugs
- not done
- not done
- fix PL and Hunspell conversion bugs
- worked
- worked
- infrastructure remake
- not done
- not done
-
sma umlaut / derivation work
- not done
- not done
- fix bugs!
Trond
- add missing Finnish translations in sme/xml/nouns.xml
- Not any more done. Actually awaiting Tomi on this issue. Tomi has done this.
- Still 13 nouns without translation...
- Not any more done. Actually awaiting Tomi on this issue. Tomi has done this.
- Mikogo for Lene and Linda
- Forgotten.
- Forgotten.
- Komi
- Completing the automaton to some state
- Quite a lot done here, frustrating debugging. Børre fixed the
- Completing the automaton to some state
-
sma proper noun grammar (with the rest of the sma gang)
- What happened to this one?
- What happened to this one?
- clock and date for Numra
- The code is ready for inclusion (modulo debugging)
- The code is ready for inclusion (modulo debugging)
- fix bugs!.
Oahpa!
Started on a sjd (Kildin Sámi) version of Leksa, potentially Numra.
Meeting memos can be found at
TODO
- Register oahpa.no (Trond)
- Add the 13 missing Finnish translations in sme/xml/nouns.xml, and
- Generate fin/xml/{nouns|verbs|adjectives}.xml, and implement the new Leksa
- clock and date for Numra (Trond, Ciprian).
Corpus gathering
Børre has contacted a text book author, signed contract Friady, will meet
TODO:
- Which books are digitized?
- These books should be OCR'ed/digitized: Anna Jacobsens books
- Its not digitized yet, should do it this week.
- use the corpus gathering document to keep track of each book
- These books should be OCR'ed/digitized: Anna Jacobsens books
- continue gathering sma corpus texts (Maja)
- get sma articles in Š-bláđđi
- the Gun Utsi book is almost there - one contract missing (Jovsset)
- will meet with the translator in July, and get the signature then
- check with Jovsset whether he succeeded (Maja)
- will meet with the translator in July, and get the signature then
- get sma articles in Š-bláđđi
- read formal letter to Laila Mattsson-Magga by David (Sjur)
- write formal letter to Davvi Girji (Sjur)
- send a copy of the signed contracts back to the authors, translators and
- find and contact the correct person in SD, to get the manuscript for all Sámi
- get the sma yearbooks from Saemien sïjhte ( Maja)
Promoting Divvun
TODO:
- make leaflet to inform about the project (Børre)
- add InDesign text (Sjur)
- make XSL script to transform Forrest XDocs to an OOo Drawer document
- add InDesign text (Sjur)
- distribute CD version through the library bus, the language centres and common
- make him send out CD's accordingly (Sjur)
- make him send out CD's accordingly (Sjur)
- update online download log statistics page (Børre)
Future plans, directions and ideas
See a separate document in plan/strat/5year.jspwiki.
Northern areas project
The project will be presented in Lovosero / Luujávr in November.
TODO:
- Attend a beginners' course in Russian (priority: the alphabet!) near you..
- plan a meeting/seminar with potential cooperation partners (Trond, Sjur)
- plan for The Real Thing (Trond, Sjur)
Infrastructure
To accomodate future enhancements in different directions (in rough order of
- test bench for all parts of our language technology efforts
- test bench enhanced, but not yet complet
- test bench enhanced, but not yet complet
- set up the Leopard Server features for collaborative support:
- permanent chat rooms
- stored (and indexed) chat transcripts of the chat rooms
- iCal server / group calendars
- wiki
- permanent chat rooms
- wiki? on G5 (is part of Leopard Server) or other web-based documentation
- improve Forrest stability and i18n support ( the divvun crashes)
-
Sjur has been working on better i18n and pdf rendering
-
Børre has some ideas for getting back to serving static html files
-
Sjur has been working on better i18n and pdf rendering
- reorganise the documentation:
- differ between target groups
- get better grouping
- decide what to write in forrest and what in wiki
- update/add missing parts
- differ between target groups
- migrate lexc lexicons to XML, splitting the task
- Name lexica (the Name project)
- Dictionaries (already in XML, task is to integrate them)
- At least migrate the lexc open POSes (Komi as a pilot case)
- Name lexica (the Name project)
- change the look of the documentation web
- use HFST as alternative to XFST
- corpus content moved to Max Planck repositories? Norsk språkbank?
- update infrastructure to allow content-restricted spellers for special target
TODO:
- phase out the G5, use the XServe as our web server instead (Børre)
- upgrade the XServe to Snow Leopard Server (Børre)
- ask Lene, Linda to install Mikogo (Trond)
- restructure and clean the script catalogue, using subdirs to categorise the
- see posting in the news group
- see posting in the news group
- infrastructure remake: ( Børre, Ciprian, Saara, Sjur, Tomi, Trond)
- make a test-all target that runs all tests we have (Ciprian, Sjur, Trond)
- delayed until we have restructured the make/build process
- delayed until we have restructured the make/build process
- define and document testing routines (Ciprian, Sjur, Trond)
- delayed until we have restructured the make/build process
Linguistics
North Sámi
(nothing new, see proofing bugs below)
Lule Sámi
(nothing new, see proofing bugs below)
South Sámi
TODO:
- Prepare text´s about normativity issue to SGL/SGM (Maja, Thomas)
- adjectives (Maja with Thomas, Trond, Sjur)
- two competing naming conventions of continuation lexicons
- One naming goes ATTRSUFF-PREDSUFF-STEMTYPE
- One follows the sme convention of naming key adjectives
- There are duplicate lexica
- One naming goes ATTRSUFF-PREDSUFF-STEMTYPE
- The comparative issue open here and there
- The ATTRSUFF-PREDSUFF-STEMTYPE lexica now go to EVENCOMP and
- The ATTRSUFF-PREDSUFF-STEMTYPE lexica now go to EVENCOMP and
- two competing naming conventions of continuation lexicons
Name lexicon/risten.no infrastructure
Tomi has played with couchdb as a replacement
Instead of building our own webforms and back-end update scripts, use XForms
From the meeting with the terminology and IT teams last week:
- no major rework on the present search interface now
- no work on the editing section; instead:
- add existing lists of sanctioned terminology as separate term entities
- add a dictionary if we can make one with sufficient quality
This means the following tasks:
- find already approved lists, in paper or electronic form (term team)
- convert paper lists to electronic lists (term team)
- convert lists to standard XML (Sjur, Tomi)
- add prepared lists to risten.no (Sjur, Tomi)
TODO:
- send eXist log files to Ciprian (Sjur)
- fix i18n bug in risten.no/G5 (so they will work without the proper locale
- fix bugs in lexc2xml; add comments to the log element (Saara)
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
- implement data synchronisation between risten.no and
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
- merge placenames which are errouneously in different entries: e.g. Helsinki,
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
Dictionaries
We need a meeting on the Komi stuff. The whole conversion to lexc will be
Lemma selection on the dictionary for pronouns: Persons as lemma forms, yes,
- Person: Separate lemmas
- Number: Separate lemmas (?)
- Case: Not separate lemmas
After Komi & Kildin, Ciprian would like to return to sma with some help from
TODO:
- set up risten.no on eXist/XServe (as a beta version site) (Sjur)
- set up required infra for smenob on risten.no/XServe (Sjur)
- Continue the dictionary infrastructure discussion (Ciprian, Sjur, Trond)
- end user documentation (how to download and install) (Ciprian, Trond)
- Contact Davvi Girji about cooperation on electronic dictionaries
- developing the mobile phone version of smenob:
- http: //pfed.info/wksite/
- http: //pfed.info/wksite/
- Komi
- take out the doublets to a separate file (Ciprian)
- merge the doublets (Jaska, Trond)
- start looking at the komfin and komeng MacDicts (Ciprian)
- Completing the automaton to some state (Trond, Jaska, Paula)
- take out the doublets to a separate file (Ciprian)
Proofing tools
HFST-based proofing tools
TODO:
- continue to write an API specification, and an implementation specification
Hunspell
Børre has released beta7, with working clitics, negation verb and copula.
Testing
Spelling Error Markup
TODO:
- test new and nested error markup (Sjur)
- nesting still needs to be tested, depends on new ccat feature
Speller testing
TODO:
- test the error type selection feature in ccat (Sjur)
- package and deliver sma proofing tools (Sjur, Tomi)
- add South Sámi DLL's to the svn repository (Sjur)
Testing open-source Norwegian spellers
Sjur has invited the open-source group to test their spell-checker using our
We should go to their developer meetings, and present our work and how to work
Speller bugs
List of bugs returned from Polderland:
- 621
- 630
- 652
- 656
- 676
Tag reordering for abbreviations have caused a lot of problems:
smj: hr. hr. hr+ABBR+Acc cand.philol. cand.philol. cand.philol+ABBR+N+Acc Per Per Per+N+Prop+Mal+Sg+Attr sme: hr. hr. hr+N+ABBR+Acc Per Per Per+N+Prop+Mal+Sg+Attr
Open issues based on test results:
sme
- 399 - missing numerals (plural forms) - still OPEN
- 425 - X not recognised; single letters were left out - still OPEN
- 435 - roman numbers - inflection of single letter numbers
- we should pregenerate all numbers once and for all, and store them in a
- we should pregenerate all numbers once and for all, and store them in a
- 461 - REGRESSION: missing suggestion (sáhkki)
- 508 - REGRESSION: accepts smj entries (most likely abbreviation missing)
- 520 - REGRESSION: r9 and š9 not defined (abbr. missing)
- 595 - prefix+name without hyphen (ovdaLot instead of ovda-Lot) - still
- 603 - suomabealdi accepted - still OPEN
- 606 - compound-tags LEXICON VUOHTA - still OPEN
- 613 - short gen. as second compound part - still OPEN
- 619 - numerals and pronouns to NAMÁK and SASJ fails - vihttasoarttat
- 629 - a taking part in compounding without hyphen - still OPEN
- only open case has word A-finálaid compounded
- only open case has word A-finálaid compounded
- 647 - numerals+NOUN - still OPEN, open case has uppercase letters
- 648 - unmotivated suggestions with numeral+noun - still OPEN
- 661 - REGRESSION: abbr. not recognized
- 709 - sámedikkeválga accepted - OPEN
- 728 - vowel shortening GenCmp+Left-tagged - still OPEN
- 779 - caseforms of pronoun okatahat - still OPEN
- 785 - does not recognize alphabet-abbr+noun - OPEN
- 802 - NEW: multiword propernouns
- 803 - NEW: FINJU- words accepted single-handed
- 804 - NEW: guovttilogát, njealjilogát
- 805 - NEW: Nouns+acronyms
smj
- 435 - roman number - single letter numbers now recognised
- we should pre-generate all numbers once and for all, and store them in a
- please note that inflection of single letter numerals is fine in
- we should pre-generate all numbers once and for all, and store them in a
- 482 - polardutkamin not recognized - FIXED
- 496 - REGRESSION: unrecognised clitics
- 556 - non-existent word accepted - FIXED
- 594 - lågenanguoktáj not recognized - still OPEN
- 595 - REGRESSION: prefix+name as split comp without hyphen
- 596 - C-giellan is not accepted - still OPEN
- 600 - Gen+hyph compound - FIXED
- 627 - prefix + hyhpen does not get accepted - FIXED
- 647 - numerals+NOUN - still OPEN, open case has uppercase letters
- 648 - unmotivated suggestions with numeral+noun - still OPEN
- 650 - REGRESSION: noun prefix+name compound without hyphen
- 652 - UPPERCASE-typos only get acronym-suggestions - still OPEN
- 692 - numeral-variants - all but one fixed (gáktsalågenantjuotakta), but
- 744 - REGRESSION: numerals + clitic
- 803 - NEW: VINJU- words accepted single-handed
- 805 - NEW: Nouns+acronyms
TODO:
- document how compounding is controlled in the PLX conversion (Tomi)
Hyphenator bugs
Open issues based on test results :
sme
No known issues!
smj
- 670 - Hard hyphen replaced with soft hyphen: 10-biejvvásattja (the word is
TODO:
- fix PL hyphenator errors (Tomi)
- almost done - one smj bug left
Installer changes
TODO:
- test InDesign installer (Sjur)
User documentation
TODO:
- InDesign documentation (Sjur)
- Norwegian translation received from Davvi Girji
1.2 release
Content:
- several smj bug fixes
- lexicalisations
- InDesign Mac & Win
- new OOo beta
- improved installers, at least for Mac, preferably also for Windows
Other
Autumn planning
- divvun/sma -> all
- speller test project -> Sjur, Børre, Thomas, X
- speech synthesis -> Sjur, Trond, BA as a starter
- dictionaries: smanob, komfineng, fkvnob -> Ciprian, Sjur, Trond
- risten.no -> Ciprian, Sjur, Tomi
- Barents project follow-up meeting
- sme and smj proofing tools, next version -> Thomas, Tomi, Sjur
- HFST-based proofing tools -> Sjur
Text to speech
TODO:
- refine syntax / dependency rules (Biret Ánne)
- start public tender process (Sjur)
MT and CAT
TODO:
- make A-ITE compile and run on Windows (Ciprian)
Open source
The repository is properly closed/open now, and the availability of the source
TODO:
- announce the availability of our repository on relevant linguistic lists
Next meeting, closing
The next meeting is 17.10.2009, 9.30 Norwegian time.
Sjur is possibly away on Thursday & Friday this week.
The meeting was closed at 11: 38.
Appendix - task lists for the next five days
Boerre
- static i18n divvun site
- order Snow Leopard for the XServe
- phase out the G5, use the XServe as our web server instead
- upgrade the XServe to Snow Leopard Server
- improve XSL script to transform leaflet Forrest XDocs to an OOo Draw document
-
gt/Makefile remake
- get translations of thank-you letter
- fix bugs!
Ciprian
- MT: test Autshumato Alpha version on Windows using Launcher
- MT: make Autshumato run on Mac
- make eXist running with the SLT-DB
- make a schema/dtd description of the lexC-file (experiment with
- transform sme-lexC files into XML format
- GT web:
- coloring output of the disambiguation
- add a tree visualizer for the dependency trees
- coloring output of the disambiguation
- Bodø Oahpa:
- generalize for semantic classes and pos
- implement a login mechanism
- implement a upload data mechanism
- setup the Bodø version on victorio
- generalize for semantic classes and pos
- Sandbox Oahpa:
- setup the Sandbox version on victorio
- setup the Sandbox version on victorio
- Running Oahpa:
- add an Oahpa clock and date excercise (cf. Numra)
- Generate fin/xml/{nouns|verbs|adjectives}.xml, and implement the new Leksa
- add an Oahpa clock and date excercise (cf. Numra)
- Lovozero Oahpa:
- install MySQL and Django on Lene and/or Trond machine
- implement a simple Leksa version for
- install MySQL and Django on Lene and/or Trond machine
- KomEngFin:
- arrange a meeting with Trond and Jaska about the basic workflow with dictionary and automata development
- compile only one version of dictionary and use css for user preferences
- generate the first StarDict version of komeng and komfin dictionary
- test the automatic sorting by Komi alphabet in xsl (as discussed
- arrange a meeting with Trond and Jaska about the basic workflow with dictionary and automata development
- SmeNob:
- the StarDict on Windows: try the HTML-plugin
- try to reduce the dict-size on mac: experiment with xPointer, etc.
- incorporate the passives into the last version of the sme: nob
- the StarDict on Windows: try the HTML-plugin
- SmaNobSwe:
- extend the smanobswe dictionary
- check all bug in the sma: nob-swe dictionary
- fix the bug with the string length in the popup window (sma: nob)
- extend the smanobswe dictionary
- SmjNob:
- start working at a Lule-Sami based on Bruce Moren's data
- start working at a Lule-Sami based on Bruce Moren's data
- SjdRus:
- continue the work at the Kildin-Russian dictionary, next internal deadline is
- continue the work at the Kildin-Russian dictionary, next internal deadline is
- Corpus:
- check the processing of new corpus documents (error logging during conversion,
- check the processing of new corpus documents (error logging during conversion,
Maja
- Prepare text´s about normativity issue to SGL/SGM
- more work on sma adjectives
- contact Turi Aleksandersen about book signature follow-up from Joseph
- contact authors of sma aticles in 9-10 eds. of the yearbook Åarjel-saemieh
- work with missing lists- southsámi
Sjur
- @Barents: plan a meeting/seminar with potential cooperation partners
- @Barents: plan for The Real Thing
- @TTS: start public tender process
- make Leif Åge send out CD's to distribution points
- start Nordplus Sprog project
- send eXist log files to Ciprian
- finalise formal letter to Laila Mattsson-Magga with Maja
- Write a formal letter to Davvi girji about electronic dictionaries
- package and deliver sma proofing tools (with Tomi)
- make XSL script to transform leaflet Forrest XDocs to an OOo Drawer document
- continue to write a proofing API & implementation specification
- name db/risten.no
- follow-up on some Polderland-related bugs: 621, 630, 652
- support and maintenance contract for sme and smj, MS+Adobe tools
- Sámi languages as part of Norsk språkbank
- set up risten.no on eXist/XServe (as a beta version site)
- set up required infra for smenob on risten.no/XServe
- find and contact the correct person in SD, to get the manuscript for all Sámi
- fix bugs!
Thomas
- prepare text´s about normativity issue to SGL/SGM
-
sma adjectives
- Digitalize south saami books
- finish reformulating the proper noun grammar like the verbs
- fix bugs!
Tomi
- add missing Finnish translations in sme/xml/nouns.xml
- package and deliver sma proofing tools (with Sjur)
- document how compounding is controlled in the PLX conversion
- fix double hyphen bugs
- fix PL hyphenator bugs
- fix PL and Hunspell conversion bugs
- infrastructure remake
-
sma umlaut / derivation work
- fix bugs!
Trond
- add missing Finnish translations in sme/xml/nouns.xml
- Mikogo for Lene and Linda
- Komi
- Completing the automaton to some state
- Completing the automaton to some state
-
sma proper noun grammar (with the rest of the sma gang)
- clock and date for Numra
- fix bugs!.