Meeting_2005-08-15

Meeting setup

  • Date: 15.08.2005
  • Time: 10.00 Norw. time
  • Place: Wherever we are : -)
  • Tools: Phone, iChat, SubEthaEdit

Agenda

  1. Opening, agenda review
  2. Reviewing the task list from a week ago
  3. Documentation - divvun.no
  4. Corpus gathering
  5. Corpus infrastructure
  6. Linguistics
  7. Speller infrastructure
  8. Other issues
  9. Summary, task lists
  10. Closing

1. Opening, agenda review, participants

Opened at 10: 20.

Present: Tomi, Sjur, Børre, Maaren, Thomas, Trond

Main secretary: Sjur

2. Reviewing the task list from the last meeting

Børre

  • Add crontab specification for the cvs update/export script Tomi made
    • Not done
  • investigate Forrest war + Tomcat issue
    • Working on it. Thor-Øyvind
  • reopen the jspwiki + UTF-8 issue
    • Not done
  • Contact Svenska bibelsällskapet
    • Not done
  • Add issue to forrest issue tracker about utf-8 ihtml documents.
    • Check for existing bug reports, it might already be in there
      • Didn´t find any, but didn´t add an issue.
  • Continue forrest i18n research
    • Sometimes abides the =locale=se/smj/fi in the url (the document gets correctly selected). The menus and tabs are translated to the language that the browser asks for.
      • Sometimes not. Sjur has added an issue to their issue tracker.
  • discuss with Anders Kintel about whether we could get the Sámi-Norwegian dictionary ahead of acceptance
    • wait till Thomas is back from vacation
  • add the Oslo contract to cvs, and begin writing a contract draft
    • Added it. If we are to use the Helsinki model, we should probably use the Helsinki contract as our draft.
      • What is the best document format, considering that the lawyer is probably using MS Word?
  • Follow up on CVS mailing:
    • check that all received forwarded mail
    • set up Thomas and Maaren when they're back from vacation
      • Now that they are back, I´ll have a look at it.

Sjur

  • risten.no bugs and fixes
    • continued
  • complete the action summary after our half-year evaluation
  • follow up on:
    • voice group-chat not working to Sámediggi
    • Maaren has problems with SubEthaEdit (can't connect)
  • To the board:
    • write proposal for permanent maintenance organisation
    • write draft specification for the outsourced tasks
    • write half-yearly project report with progress and bugdet status
    • write agenda
    • Deadline for the board tasks: 3 weeks ahead of meeting (date not decided, but in September)
      • some work
  • add Divvun to the UNESCO Hall-of-Fame list
    • Not done
  • Other tasks done:
    • monthly report
    • meeting with Trond to prepare Helsinki Gathering
    • some i18n issues with Forrest

Tomi

  • Aspell: Create the affix file, work on automatic conversion from LEXC to affix file
    • IMO, automatic conversion is not plausible, doing manual affix file
  • investigate Aspell/MySpell/OOo issues
    • Not really (only OOo (including NeoOffice?) & Mozilla products using MySpell)
  • three-part compounding
    • Not done
  • decapitalisation of proper nouns when (if?) compounded, and when derived (the oslolaš issue)
    • Not done
  • corpus infrastructure: dtd location (both public and internal)
    • Not done
  • corpus infrastructure: file and dir organisation
    • Not done
  • Follow up on CVS mailing:
    • check that all received forwarded mail
    • set up Thomas and Maaren when they're back from vacation

On vacation last week

Trond

  • Participated in the meeting with Sjur in Helsinki
  • On vacation, issues from last week repeated at the end of the memo

Maaren

  • On vacation, issues from last week repeated at the end of the memo

Thomas

  • On vacation, issues from last week repeated at the end of the memo

3. Documentation - divvun.no

We are waiting for the Linux server (will solve our UTF-8 and automatic update issues).

The giellatekno site is converted to the Forrest format, and is in principle ready for publication. Trond has to read through it and do some QA/editing before being released.

The standing invitation remains: Document what you do, when you do it.

4. Corpus gathering

The Oslo contract is added. We now need to work out our own suggestion for a contract. Børre, Trond and Sjur will have a meeting Tuesday 16.8. at 10.00 Norwegian Time to work on that draft. Sjur sends out an e-mail with details before the meeting.

5. Corpus infrastructure

Do documentation.

Naming conventions and directory structure

Principle: incoming directory tree

  1. according to collector (having an incoming basket)
  2. according to..

permanent directory tree according to author institution, Perhaps according to text genre, and thereafter author institution:

The discussion is moved to the discussion list, eventually to a separate meeting. But we'll start (or rather continue) in the newsgroup.

6. Linguistics

North Sámi

  • The missing list should be reworked and cleaned.
  • Sámi place names in Norway should updated, contact Jonny Andersen at Statens Kartverk.

Lule Sámi

When do we get the lexicon from Anders Kintel?

7. Speller infrastructure

aSpell

Write documentation here as well.

Is working, see above. Needs further optimisation before it is useful, cf the size discussion. Size target: At least below 10 Mb, wish: 6 Mb or less. It started at 580Mb, and now it is around 380 MB. Declining, that is.

Tomi is working on the affix file, and by that reducing the size of the speller file.

The affix file will so-to-speak recreate the inflection of our LEXC lexicon in the Aspell format (similar to MySpell?), in a format that could be called one-level (there is no concept of two levels with regular mappings between them in Aspell, or any of the *spell engines, for that matter).

Useful links:

MS Office spellers

To be specified in the outsourcing tech-spec. What about direct integration with MS Office? See this page from MS.

Discussion left for the future, and when we have the offers to compare with.

OpenOffice.org

Questions:

1) Is it possible to use Aspell with OOo?

No. From the site:

Background
Lingucomponent was started by Kevin Hendricks to integrate various open source spell
checkers into the OpenOffice.org build. With a little prodding from Kevin Atkinson,
the author of Pspell, a new spell checker called MySpell was written in C++ that supported
affix compression, based on Ispell. This code has been given to Pspell and will eventually
make it into a future Pspell release after it has been integrated.

Later note: Pspell is dead and Aspell is its replacement. I think Aspell is found at
aspell.sourceforge.net. There is no OOo interface to Aspell (no way to use Aspell with OOo).
kh.

MySpell, which builds on almost any system, is used to support spell checking in
OpenOffice.org.

Links to possibly useful stuff:

2) Are the Aspell source files compatible with MySpell?

To be investigated. Goal: to simplify maintenance as much as possible, ie to have as few source and target formats as possible.

The Debian Finnish aspell page seems to point to source files that are common to both iSpell, Aspell and Myspell. Follow the link near the bottom of the page named developer information for aspell-fi.

http://hunspell.sourceforge.net/ is supposed to become the new OpenOffice.org spellchecker

8. Other

Helsinki gathering

Program draft available at /admin/physical_meetings/helsinki-2005-08.html

To bring from Kautokeino:

  • headphones
  • battery pack for Sjur
  • registration money for FSMNLP

Are the hotel rooms reserved? Probably not. Maaren will reserve rooms ( Arthur or Helka).

Flights should be booked individually, with the bill sent to Sámediggi.

Børre in Kautokeino

First week there about a week after the Helsinki gathering. Then every third week by default.

Technical issues

  • The mac os / perl bug (at least Trond has it):
    • utf8 "\xC4" does not map to Unicode at /Users/trond/gt/script/preprocess line 82. This msg did not show up in 10.3 (perl 5.8.1), but does so in 10.4 (perl 5.8.6). It is probably a perl - OS mismatch.
  • 25 open bugs - too much! Have a look at what you can fix.

9. Summary, task list

Børre

  • Add crontab specification for the cvs update/export script Tomi made
    • Adjust it so it will work with the new server
  • reopen the jspwiki + UTF-8 issue
  • Add issue to forrest issue tracker about utf-8 ihtml documents.
  • Contact Svenska bibelsällskapet
  • discuss with Anders Kintel about whether we could get the Sámi-Norwegian dictionary ahead of acceptance
  • Begin writing a contract draft, together with Trond and Sjur. Coordinate with Kimmo Koskenniemi
  • Follow up on CVS mailing:
    • check that Thomas and Maaren receive forwarded mail from cochice
    • set up Thomas and Maaren
  • Investigate compatibility between MySpell and Aspell source files
  • Book Helsinki flight via Via : -)
  • Help Trond with Saara's computer
  • meeting about corpus contract
  • Finish giellatekno forrest transition.

Maaren

  • The missing list, both the overall missing list from our xml corpus, and a file-for-file review, in order to get different terminology
  • check whether work is checked in
  • Go through the missing list from risten.no
  • book rooms in Helsinki
  • remember headphones, money and batteries for Helsinki

Sjur

  • risten.no bugs and fixes
  • complete the action summary after our half-year evaluation
  • follow up on:
    • voice group-chat not working to Sámediggi
    • Maaren has problems with SubEthaEdit (can't connect)
  • To the board:
    • write proposal for permanent maintenance organisation
    • write draft specification for the outsourced tasks
    • write half-yearly project report with progress and bugdet status
    • write agenda
    • Deadline for the board tasks: 3 weeks ahead of meeting (date not decided, but in September)
  • add Divvun to the UNESCO Hall-of-Fame list
  • Plan the Helsinki meeting
  • meeting about corpus contract
  • project planning with Trond

Thomas

  • work on Lule Sami verbs
  • contact Jonny Andersen about place names

Tomi

  • Aspell: Create the affix file, work on automatic conversion from LEXC to affix file
  • investigate Aspell/MySpell/OOo issues
  • three-part compounding
  • decapitalisation of proper nouns when (if?) compounded, and when derived (the oslolaš issue)
  • corpus infrastructure: dtd location (both public and internal)
  • corpus infrastructure: file and dir organisation
  • Follow up on CVS mailing:
    • check that all received forwarded mail
    • set up Thomas and Maaren when they're back from vacation

Trond

  • Work on the bug list (Lule Sámi).
  • Work on compounds (three-part + oslolaš, with Tomi)
  • Work on the corpus interface (with Lars)
  • Corpus infrastructure: dtd location
  • Get the new version of the New Testament
  • Check Hans-Ragnars names.
  • Look at the Sámi names issue
  • Plan the Helsinki meeting
  • New coworker
  • meeting about corpus contract
  • check the new giellatekno site
  • project planning with Sjur

10. Next meeting, closing

22.08.2005 10: 00

Closed at 12: 55