Meeting_2009-03-02
Contents:
- Meeting setup
- Agenda
- Opening, agenda review, participants
- Updated task status since last meeting
- Pedagogical software online
- Corpus gathering
- Promoting Divvun
- Future plans, directions and ideas
- Infrastructure
- Linguistics
- Name lexicon/risten.no infrastructure
- Proofing tools
- Other
- Next meeting, closing
- Appendix - task lists for the next five days
Meeting setup
- Date: 2.3.2009
- Time: 09.30 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat
Agenda
Cf. one of the following, depending on context:
- the upper bar of the SEE window (provided you use the JSPWiki syntax mode)
- the TOC in Forrest-rendered output, like HTML and PDF
Opening, agenda review, participants
Opened at 12: 17.
Present:
Absent: Ciprian
Agenda accepted as is.
Updated task status since last meeting
Børre
- Implement new version of giellatekno front page without ToC
- not done
- not done
- improve OOo instructions for leaflet
- not done
- not done
- work on and debug compounds in Hunspell
- not done
- not done
- leaflet: add Linux info
- not done
- not done
-
gt/Makefile remake
- meeting to go through the 1-page version Feb. 11, at 9: 30 Norw. time
- done
- done
- Set up the apache server for the risten.no beta on the linux box + Xserve.
- not done
- not done
- fix bugs!
Ciprian
- close the open SVN repository at requested paths
- todo
- todo
- improve processing of new corpus documents
- todo
- todo
- make the pipeline of StarDict dictionary generation running on the Mac
- ongoing
- ongoing
- take care of the error logging during conversion process
- todo
- todo
- look at the xml conversion quality
- todo
- todo
- make a list of (general and special) problems related with
- todo
- todo
- continue the search and testing of an appropriate tool for
- ongoing (this is urgently needed, as far as my last expericences)
- ongoing (this is urgently needed, as far as my last expericences)
- build the intelligent sma: nob dict in mac-format
- already done, just some polishig work and a source bug check
- already done, just some polishig work and a source bug check
- build the sami-week-version of sme: nob in all formats
- done (however, the UiT gang agreed on some extensings and improvments of the version)
- done (however, the UiT gang agreed on some extensings and improvments of the version)
- infrastructure remake - news discussion
- restarting this week
- restarting this week
- end user documentation (how to download and install) - dictionaries
- done
- done
-
fix bugs!
- todo
David
- continue gathering sma corpus texts
- prepare a workshop on South Sámi orthography and grammar (David)
- prepare data for each topic (with Sjur and Trond)
- prepare data for each topic (with Sjur and Trond)
- stay in touch with UmU about c-thesises in south sami
- Translate light version contract to swedish
- find a conference location for our sma seminar in Trondheim
- contact the sami radio about older manuscrips
- call Anne-Grethe Bientie and Bierna Bientie about bible texts
- call Š-blađđi about south saami texts
- reupload the Dejpeladtje muvhth vätnoeh jih vuekie text, which is
- meeting to go through the 1-page version Feb. 11, at 9: 30 Norw. time
- prepare to go to Røyrvik and promote Dåvvome
Jovsset
- get the list of verbs from the auhtors of Verbh.
- Find a suitable infrastructure for the distribution of the CD version.
- write formal letter to translator of Gåessie dah jeatjebh åerieminie
- leaflet: review the Windows pictures, and usage instructions for MS Office.
- Write installation instructions for word 2003
- better Windows usage descriptions, with emphasis on the pitfalls
- work with missing lists
Per-Eric
- follow-up contract from Lena Davidsson
- Not done, tried to contact her, but no answer
- Not done, tried to contact her, but no answer
- Call Gøran Matias Andersen about his smj texts
- I have called, but no answer from him yet
- I have called, but no answer from him yet
-
fix bugs!
- Nothing done
Maja
- contact NRK Nord-Trøndelag about recordings in sma and broadcast
- nytt adjektiv-møte?
- send Fritz Jakobssen a contract
Saara
- map oahpa.uit.no to gtsvn.uit.no/oahpa/
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
Sjur
- name db/risten.no
- todo
- todo
- follow-up on some Polderland-related bugs: 621, 630, 652
- todo
- todo
- InDesign documentation
- todo
- todo
- continue the dictionary infrastructure discussion (with Ciprian, Trond)
- todo
- todo
- Contact Davvi Girji about cooperation on electronic dictionaries
- todo
- todo
- subcontractor work for sma proofing tools, MS and Adobe versions
- done, waiting for more documents before signing the contract
- done, waiting for more documents before signing the contract
- support and maintenance contract for sme and smj, MS+Adobe tools
- waiting for answer
- waiting for answer
- Sámi languages as part of Norsk språkbank
- waiting for answer
- waiting for answer
- contact MS Nordic, and ask them to include the Divvun tools in the next Office
- waiting for answer
- waiting for answer
- leaflet: add link to OOo installation and usage instructions
- todo
- todo
- leaflet: add InDesign text
- todo
- todo
- meeting to go through the 1-page version Feb. 11, at 9: 30 Norw. time
- done
- done
- set up risten.no on eXist/XServe (as a beta version site)
- todo
- todo
- set up required infra for smenob on risten.no/XServe
- todo
- todo
- meeting Thursday 12.30 to go through the present installers
- still to do
- still to do
- fix bugs!
Thomas
- look at test cases still not behaving properly
- prepare a presentation on derivational grammar for the sma workshop
- fix bugs!
Tomi
- document how compounding is controlled in the PLX conversion
- not done
- not done
- fix double hyphen bugs
- not done
- not done
- fix PL hyphenator bugs
- not done
- not done
- fix PL and Hunspell conversion bugs
- worked
- worked
- infrastructure remake
- not done
- not done
- make a Mac installer for the InDesign Divvun tools
- done
- done
- replace InstallShield with an open-source alternative
- not done
- not done
- meeting Thursday 12.30 to go through the present installers
- wasn't necessary
- wasn't necessary
- fix bugs!
Trond
- Register oahpa.no
- end user documentation (how to download and install) - dictionaries
- fix bugs!.
Pedagogical software online
As of Tuesday last week, oahpa has had about 17 000 hits!
Meeting memos can be found at
TODO
- Register oahpa.no (Trond)
Corpus gathering
TODO:
- continue gathering sma corpus texts (David, Jovsset)
- older sámi radio texts would be very welcome as well (David)
-
Maja: NRK Nord-Trøndelag has a lot of recordings in sma, possibly
- call Fylkesmannen i Nordland about sma material (Maja)
- send Fritz Jakobssen a contract (Maja)
- Š-bláđđi has some articles in sma, we should ask for permission to use it
- follow up on the sma Bible translations (David)
- the Gun Utsi book is almost there - one contract missing (Jovsset)
- older sámi radio texts would be very welcome as well (David)
- other contacts: Lena Davidsson, daughter to Lars-Matto Tuolja
- make a 1-page, light version of the contract, also in Swedish
- Bokmål and Northern Sami versions are commited at xtdoc/sd/src/documentation/content/xdocs/adm/legal/short-contract.(nb|se).xml
- meeting to go through the 1-page version Feb. 11, at 9: 30 Norw. time
- Bokmål and Northern Sami versions are commited at xtdoc/sd/src/documentation/content/xdocs/adm/legal/short-contract.(nb|se).xml
- discuss infra improvements for corpus rep administration (Børre, Ciprian)
- delayed till we are done with the gt/Makefile improvements
Promoting Divvun
We need more promotions of the Divvun tools. Making a movie demonstrating its
Next tour de Divvun probably in week 13. Per-Eric has talked to Sámij
TODO:
- make leaflet to inform about the project (Jovsset, David)
- better Windows usage descriptions, with emphasis on the pitfalls
- add link to OOo installation and usage instructions (Sjur)
- improve OOo instructions (Børre)
- add Linux info (Børre)
- add InDesign text (Sjur) - requires installer
- review the Windows pictures, and usage instructions for MS Office 2003
- text (Jovsset)
- text (Jovsset)
- better Windows usage descriptions, with emphasis on the pitfalls
- distribute CD version through the library bus, the language centres and common
- contact MS Nordic, and ask them to include the Divvun tools in the next Office
- prepare to go to Røyrvik and promote Dåvvome (David)
- make a screencast of using our tools (Børre)
Future plans, directions and ideas
See a separate document in plan/strat/5year.jspwiki.
Infrastructure
To accomodate future enhancements in different directions (in rough order of
- test bench for all parts of our language technology efforts
- test bench enhanced, but not yet complet
- test bench enhanced, but not yet complet
- set up the Leopard Server features for collaborative support:
- permanent chat rooms
- stored (and indexed) chat transcripts of the chat rooms
- iCal server / group calendars
- wiki
- permanent chat rooms
- wiki? on G5 (is part of Leopard Server) or other web-based documentation
- improve Forrest stability and i18n support ( the divvun crashes)
-
Sjur has been working on better i18n and pdf rendering
-
Børre has some ideas for getting back to serving static html files
-
Sjur has been working on better i18n and pdf rendering
- reorganise the documentation:
- differ between target groups
- get better grouping
- decide what to write in forrest and what in wiki
- update/add missing parts
- differ between target groups
- migrate lexc lexicons to XML, splitting the task
- Name lexica (the Name project)
- Dictionaries (already in XML, task is to integrate them)
- At least migrate the lexc open POSes (Komi as a pilot case)
- Name lexica (the Name project)
- change the look of the documentation web
- sfst?
- replacement for xfst -> see omorf
,
- replacement for hunspell/open-source proofing tools
- replacement for xfst -> see omorf
,
- investigate the NSIS installer,
- corpus content moved to Max Planck repositories? Norsk språkbank?
- update infrastructure to allow content-restricted spellers for special target
SVN issues:
- http access not yet available (https only at the moment, but it works as
- read access to the whole repo is working, BUT:
-
gt/*/polderland should be protected
-
plan should be protected
-
gt/*/polderland should be protected
- certain users need to be on the UiT VPN to be able to commit (bug #705)
- UTF-8 problem in svn commit mails: the commit log text is garbled
TODO:
- infrastructure remake: ( Børre, Ciprian, Saara, Sjur, Tomi, Trond)
- make a test-all target that runs all tests we have (Ciprian, Sjur, Trond)
- delayed until we have restructured the make/build process
- delayed until we have restructured the make/build process
- define and document testing routines (Ciprian, Sjur, Trond)
- delayed until we have restructured the make/build process
- delayed until we have restructured the make/build process
- follow-up migration to svn:
- close the open SVN repository at requested paths (Ciprian)
- completely closed at the moment, until we have solved the path-based control
- completely closed at the moment, until we have solved the path-based control
- prepare and discuss with external users: Jack Ruether (Trond)
- close the open SVN repository at requested paths (Ciprian)
- test iCal Server (on G5) (Børre)
- remove TOC from the giellatekno home page by using dispatcher (Trond)
- get a headset with microphone for Maja (Børre)
Linguistics
North Sámi
(nothing new, see proofing bugs below)
Lule Sámi
(nothing new, see proofing bugs below)
South Sámi
TODO:
- step two for adjectives (David, Maja, Trond, Sjur)
- we'll try to find a time this week as well
- we'll try to find a time this week as well
- finish reformulating the proper noun grammar like the verbs (Sjur, Trond)
- Missing list (focus on closed class words and addition on missing nouns,
-
David made a new missing list from an OCR-ed book
-
David made a new missing list from an OCR-ed book
- workshop on South Sámi orthography and grammar (David)
- done
- done
- make a document for the SGL/SGM, sma section (David with
- finish the umlaut / derivation work (Thomas, Sjur, Tomi)
- adjectives (Maja with David, Thomas, Trond, Sjur)
- placenames (Jovsset)
- abbreviations (David)
Name lexicon/risten.no infrastructure
Instead of building our own webforms and back-end update scripts, use XForms
From the meeting with the terminology and IT teams last week:
- no major rework on the present search interface now
- no work on the editing section; instead:
- add existing lists of sanctioned terminology as separate term entities
- add a dictionary if we can make one with sufficient quality
This means the following tasks:
- find already approved lists, in paper or electronic form (term team)
- convert paper lists to electronic lists (term team)
- convert lists to standard XML (Sjur, Tomi)
- add prepared lists to risten.no (Sjur, Tomi)
TODO:
- fix i18n bug in risten.no/G5 (so they will work without the proper locale
- fix bugs in lexc2xml; add comments to the log element (Saara)
- finish first version of the editing (Sjur)
- test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
- make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob
- convert propernoun-($lang)-lex.txt to a derived file from common xml files
- implement data synchronisation between risten.no and
- start to use the xml file as source file
- clean terms-sme.xml such that all names have the correct tag for their use
- merge placenames which are errouneously in different entries: e.g. Helsinki,
- publish the name lexicon on risten.no (Sjur)
- add missing parallel names for placenames (linguists)
- add informative links between first names like Niillas and Nils
Dictionaries
Ciprian is working towards our first sma-nob/swe dictionary. He has also had
TODO:
- Set up the apache server for the risten.no beta on the linux box + Xserve.
- set up risten.no on eXist/XServe (as a beta version site) (Sjur)
- not done
- not done
- set up required infra for smenob on risten.no/XServe (Sjur)
- waiting for the above
- waiting for the above
- Continue the dictionary infrastructure discussion (Ciprian, Sjur, Trond)
- end user documentation (how to download and install) (Ciprian, Trond)
- started, not complete
- started, not complete
- Contact Davvi Girji about cooperation on electronic dictionaries
Proofing tools
Hunspell
The version we have at the moment (beta6) will be our last one for a while.
OpenXSpell
As a middle-man between Enchant and MacOS X speller API, this is still
Testing
Spelling Error Markup
TODO:
- Set up ways of adding meta-information (source info, used in testing or not,
- test new and nested error markup (Sjur)
- nesting still needs to be tested, depends on new ccat feature
Speller testing
TODO:
- test the error type selection feature in ccat (Sjur)
Testing open-source Norwegian spellers
Sjur has invited the open-source group to test their spell-checker using our
We should go to their developer meetings, and present our work and how to work
Speller bugs
List of bugs returned from Polderland:
- 621
- 630
- 652
- 656
- 676
Open issues based on test results:
sme
- 399 - REGERSSION: missing numerals (plural forms)
- 425 - X not recognized; single letters were left out - OPEN
- 435 - roman numbers - inflection of single letter numbers
- we should pregenerate all numbers once and for all, and store them in a
- we should pregenerate all numbers once and for all, and store them in a
- 581 - consonant doubling - seems FIXED
- 595 - prefix+name without hyphen (ovdaLot instead of ovda-Lot) - still
- 603 - suomabealdi accepted - still OPEN
- 606 - compound-tags LEXICON VUOHTA - OPEN
- 613 - short gen. as second compound part - still OPEN
- 619 - numerals and pronouns to NAMÁK and SASJ fails - vihttasoarttat
- 629 - a taking part in compounding without hyphen - still OPEN
- only open case has word A-finálaid compounded
- only open case has word A-finálaid compounded
- 642 - noun/adj/proper + hyphen + ain - still OPEN
- 647 - numerals+NOUN - still OPEN, open case has uppercase letters
- 648 - unmotivated suggestions with numeral+noun - still OPEN
- 728 - vowel shortening GenCmp+Left-tagged - still OPEN
- 779 - caseforms of pronoun okatahat
smj
- 435 - roman number - single letter numbers now recognised
- we should pregenerate all numbers once and for all, and store them in a
- please note that inflection of single letter numerals is fine in
- we should pregenerate all numbers once and for all, and store them in a
- 594 - REGRESSION: lågenanguoktáj not recognized
- 595 - prefix+name as split comp without hyphen - still OPEN
- 596 - C-giellan is not accepted - OPEN
- 647 - numerals+NOUN - still OPEN, open case has uppercase letters
- 648 - unmotivated suggestions with numeral+noun - still OPEN
- 650 - noun prefix+name compound without hyphen - still OPEN
- 652 - UPPERCASE-typos only get acronym-suggestions - still OPEN
- 692 - numeral-variants - all but one fixed (gáktsalågenantjuotakta), but
TODO:
- document how compounding is controlled in the PLX conversion (Tomi)
Hyphenator bugs
Open issues based on test results :
sme
- 468 - Márkomeanu - still OPEN
- 547 - hyphen in front of vowel: Lotnolasealáhusas - still OPEN
- 548 - mid syllable hyphenation: Háliidivččen - still OPEN
- 549 - division without hyph: Váccedettiin - still OPEN
smj
- 547 - hyphen in front of vowel: Jienastimnjuolgadusá and Orgánajs -
- 670 - Hard hyphen replaced with soft hyphen: 10-biejvvásattja (the word is
TODO:
- fix PL hyphenator errors (Tomi)
Installer changes
TODO:
- make a Mac installer for the InDesign Divvun tools (Tomi)
- first version done
- first version done
- replace InstallShield with an open-source alternative (Tomi)
- test InDesign installer (Sjur)
- meeting Thursday 9: 30 to go through the present installers (Sjur, Tomi)
User documentation
TODO:
- InDesign documentation (Sjur)
- Norwegian translation received from Davvi Girji
1.2 release
Content:
- several smj bug fixes
- lexicalisations
- InDesign Mac & Win
- new OOo beta
- improved installers, at least for Mac, preferably also for Windows
Other
Winter holidays
Who | When |
---|---|
Sjur | 16-20.2. |
David | 19-20.2. |
Børre | nothing |
Maja | nothing |
Thomas | nothing |
Trond | not holiday, but going to Greenland 16.2.-10.3. |
Tomi | nothing |
Jovsset | nothing |
School in Troms have second week in March.
Text to speech
The TTS meeting with Antti was held (without Maja and Thomas). Topics:
- what kind of feedback is needed from us? Quite a lot of linguistics
TNC, rikstermbanken and Sámi terms
RTB will be released:
- Datum: torsdagen den 19 mars 2009
- Tid: 14.00–15.00 (följt av mingel fram till 16.00)
- Plats: Rosenbads konferenscenter, Drottninggatan 1, Stockholm
Available at: http://www.rikstermbanken.se
Corpus contracts + open source
Postponed until the svn repository is fully functional (it is too open now).
Other issues
- Maja would like to have a bigger monitor -> coming soon (it is ordered)
- nothing new yet - Børre will call and check
- Done. Installed and in use
- nothing new yet - Børre will call and check
Next meeting, closing
The next meeting is 9.3.2009, 9.30 Norwegian time.
The meeting was closed at 13: 08.
Appendix - task lists for the next five days
Boerre
- Implement new version of giellatekno front page without ToC
- improve OOo instructions for leaflet
- leaflet: add Linux info
-
gt/Makefile remake
- Set up the apache server for the risten.no beta on the linux box + Xserve.
- make a screencast of using our tools
- get a headset with microphone for Maja
- make remake meeting Wednesday 13.00 with Ciprian, Sjur, Tomi, Trond
- fix bugs!
Ciprian
- close the open SVN repository at requested paths
- improve processing of new corpus documents
- make the pipeline of StarDict dictionary generation running on the Mac
- take care of the error logging during conversion process
- look at the xml conversion quality
- make a list of (general and special) problems related with
- continue the search and testing of an appropriate tool for
- build the sami-week-version of sme: nob in all formats
- make remake meeting Wednesday 13.00 with Børre, Sjur, Tomi, Trond
- fix bugs!
David
- continue gathering sma corpus texts
- stay in touch with UmU about c-thesises in south sami
- Translate light version contract to swedish
- contact the sami radio about older manuscrips
- call Anne-Grethe Bientie and Bierna Bientie about bible texts
- call Š-blađđi about south saami texts
- reupload the Dejpeladtje muvhth vätnoeh jih vuekie text, which is
- prepare to go to Røyrvik and promote Dåvvome
-
sma abbreviations
- make a document for the SGL/SGM, sma section
- sma adjectives
Jovsset
- get the list of verbs from the auhtors of Verbh.
- Find a suitable infrastructure for the distribution of the CD version.
- write formal letter to translator of Gåessie dah jeatjebh åerieminie
- leaflet: review the Windows pictures, and usage instructions for MS Office.
- Write installation instructions for word 2003
- better Windows usage descriptions, with emphasis on the pitfalls
- work with missing lists
- sma placenames
Per-Eric
- follow-up contract from Lena Davidsson
- Call Gøran Matias Andersen about his smj texts
- work with missing lists
- fix bugs!
Maja
- contact NRK Nord-Trøndelag about recordings in sma and broadcast
- more work on sma adjectives
- send Fritz Jakobssen a contract
Saara
- map oahpa.uit.no to gtsvn.uit.no/oahpa/
- add new XSL/XML headers for proofing test docs
- Set up ways of adding meta-information for proofing correct corpus docs
Sjur
- name db/risten.no
- follow-up on some Polderland-related bugs: 621, 630, 652
- InDesign documentation
- continue the dictionary infrastructure discussion (with Ciprian, Trond)
- Contact Davvi Girji about cooperation on electronic dictionaries
- subcontractor work for sma proofing tools
- support and maintenance contract for sme and smj, MS+Adobe tools
- Sámi languages as part of Norsk språkbank
- contact MS Nordic, and ask them to include the Divvun tools in the next Office
- leaflet: add link to OOo installation and usage instructions
- leaflet: add InDesign text
- set up risten.no on eXist/XServe (as a beta version site)
- set up required infra for smenob on risten.no/XServe
- make remake meeting Wednesday 13.00 with Børre, Ciprian, Tomi, Trond
- meeting Thursday 9.30 to go through the present installers
-
sma umlaut / derivation work
- fix bugs!
Thomas
-
sma umlaut / derivation work
-
sma adjectives
- fix bugs!
Tomi
- document how compounding is controlled in the PLX conversion
- fix double hyphen bugs
- fix PL hyphenator bugs
- fix PL and Hunspell conversion bugs
- infrastructure remake
- replace InstallShield with an open-source alternative
- meeting Thursday 09.30 to go through the present installers
-
sma umlaut / derivation work
- make remake meeting Wednesday 13.00 with Børre, Ciprian, Sjur, Trond
- fix bugs!
Trond
- Register oahpa.no
- end user documentation (how to download and install) - dictionaries
- make remake meeting Wednesday 13.00 with Børre, Ciprian, Sjur, Tomi
- fix bugs!.