Meeting setup

  • Date: 13.06.2005
  • Time: 10.00 Norw. time
  • Place: Wherever we are : -)
  • Tools: Phone, iChat, SubEthaEdit


  1. Opening, agenda review
  2. Reviewing the task list from a week ago
  3. Documentation -
  4. Corpus gathering
  5. Corpus infrastructure
  6. Linguistics
  7. Term db
  8. Other issues
  9. Summary, task lists
  10. Closing

1. Opening, agenda review, participants

Opened at 10.00. Agenda accepted as is.

Present: Børre, Maaren, Sjur, Thomas, Tomi, Trond

Main secretary: Tomi

2. Reviewing the task list from the last meeting


  • Contact Kevin Scannell of an Crúbadán fame, ask if we can use his north sami material.
    • We can get his material as either a word list, or a sentence-wise corpus
    • He wanted to continue the cooperation (initiated last year, it seems)
  • Contact Roy Dragseth about cvs-mailing
    • He will do it this week
  • Work on
    • Waiting for Tomcat to be upgraded
    • Sysadm back from vacation today
  • Continue http: // conversion to forrest ...
    • Not a lot of work has been done, but an alpha version is running.
  • Contact Svenska bibelsällskapet
    • Nothing done
  • Add issue to forrest issue tracker about utf-8 ihtml documents.
    • Check for existing bug reports, it might already be in there
      • Nothing done
  • Continue forrest i18n research
    • Sometimes abides the =locale=se/smj/fi in the url (the document gets translated). The menus and tabs are translated to the language that the browser asks for.
    • Will contact che-che who is actively pursuing i18n in forrest.


  • continue missing list on tuesday and thursday this week
  • find and prepare issues for the language board meeting okei
  • translate Helsinki contract into rough Norwegian; send it to Sjur, then lawyer
    • Asked Elli-Kirsti Nystad to do the translation and she has promised to translate it into Norwegian


  • Check the translation of the Helsinki contract with Maaren; forward the task to Trond if not yet assigned
    • n/a
  • Ask Anne-Britt to update the contract with the University (we are getting 3, not 2, persons there from the fall)
    • not yet done
  • write the LIST option to our test tools, as requested in the discussion group.
    • not done
  • complete the action summary after our half-year evaluation
    • not done
  • fix bugs
    • Took most of the week, many fixes, 9 open.
  • contact Kimmo Koskenniemi about kwic-snt and UTF-8 bug
    • not done. We all miss it.
  • voice group-chat not working to Sámediggi
    • Leif Åge has been on vacation
  • add i18n bug to Forrest issue tracker
    • not done
  • add i18n menu as feature request to Forrest issue tracker
    • not done


  • Begin to prepare issues to giellalávdegotti čoahkkin
    • Prepared issues and sent draft to Maaren
  • Present issues in dicussion forum
    • Done
  • Finish the work on verb valency
    • Finished - Congratulations!!!
  • Start to look at Lule Sámi
    • Started


  • Start looking at the smj infrastructure, turn it into utf-8, look at whether updates and improvements done for sme should be done for smj as well, in order to get a uniform infrastructure.
    • Done
  • update catxml to remove sections in unwanted language(s)
    • Some problems getting the removal to work
  • Look into shortening of three-part compounds, middle part
    • nothing done
  • decapitalisation of proper nouns when (if?) compounded, and when derived (the oslolaš issue)
    • nothing done


  • translate Helsinki contract into rough Norwegian; send it to Sjur, then to lawyer
    • Totally forgotten. Did I ever receive it?
      • No, I decided to wait till Maaren was back, which was good, since she had already given the task to someone else
    • Worked on linguistics, on bugs, to a certain extent on documentation.


  • The release.
  • The Bugzilla bugs, and a quip or two?
  • continue the discussion on the language policy of in the newsgroup, esp. re the open questions:
    • How much in 5 languages?
    • Size of general public info: Trond: only a few paragraphs - about the project, and the language policy of the site. More specifically, it should include:
      • what the divvun project is
      • why it is made, by whom, on whose decision
      • what we make, (why we do not make sma)
      • how we make it
      • how, when, and for whom it eventually will be used, and on what terms and price
      • an explanation of our lg policy, and a good set of links.
    • should news be in all languages? What about RSS feeds from Forrest?

3. Documentation -

Deadline 22.6.

Børre will be the main responsible for getting it all in place till the deadline.

Things required before the opening:

  1. Press release text, conforming to what we actually release (in 3 languages? More?)
  2. front pages up and running, with a nice look and interlinking in place, as site-generated pages.
  3. front page and general info in all languages
  4. general outline of the project, with deliveries and main timetable
  5. front pages up and running, with a nice look and interlinking in place, as site-generated pages.
  6. The actual content of the documentation, our common child, should be gone through with the forthcoming public release in mind.

Requirements for the opening:

  • i18n in place? What if we can't get it to work? Subtabs for each language. The submenus must follow the language of the subtabs.
  • frontpage contains the points outlined at the last meeting. Cf. this list:
    • Information in 5 languages (sme, smj, nob, fin, eng)
    • Size of general public info: only a few paragraphs. It should include:
      • what the divvun project is
      • why it is made, by whom, on whose decision
      • organization (project board, part of GIO, with UiT, what else?)
      • how, when, and for whom it eventually will be used, and on what terms and price
      • what we make
      • why we do not make sma (background on separate page, in nob and sme)
      • timetable (separate page, all langs.)
      • how we make it (separate page, all langs)
      • an explanation of our lg policy (separate page, all)
      • a good set of links, incl our own newsfeed (RSS)

The separate pages:

  • The intro page (all lgs)
  • sma (in nob and sme)
  • timetable (all langs.)
  • how we make it (all langs)
  • an explanation of our lg policy (all lgs)


  • all docs, orig lang: Wednesday, 15.6.
  • translations: Friday, 17.6.
  • Alpha site (Tomcat running, present version of the site): Wednesday, 15.6.
  • Beta site (all docs in place): Monday, 20.6.
  • QA, checking links, i18n, content etc.: Tuesday, 21.6.
  • Opening: Wednesday, 22.6.


  • The intro page: Sjur x5
  • sma omission background: Trond x2
  • timetable: Sjur x5
  • how we make it: Børre x5
  • an explanation of our lg policy: Sjur x5


The files will be written in nob, and made available via cvs, in xtdocs/sd/.../xdocs/

  • eng: Sjur, others?
  • sme: Børre (Maaren proofreads)
  • smj: Thomas, proofread by e.g. Kåre Tjikkom
  • fin: Maaren


Børre will make document stubs in the correct places, with the correct filenames. Today! (right after the meeting:-).

4. Corpus gathering

The license contract is being translated. Maaren will check today whether it is finished.

5. Corpus infrastructure

catxml: some problems with section extraction/deletion. Tomi will continue the work on it.

We need routines for handling other formats than Microsoft Word:

  • Clean text
  • html
  • pdf, if possible
  • ideosyncratic formats

6. Linguistics

We should postpone the substantial issues to after the divvun release. The bug list should still be followed.

7. Term db /

Preparing the release, solving bugs.

8. Other issues

Bugzilla for sme and smj parsers.

When we set up Bugzilla, we didn't think of the fact that there would be two languages in this project, sme and smj, and more in the future. Now, smj bugs must be included. The question is, should we have smj and sme as top components ("products" in the Bugzilla terminology), or as bottom options in the decision tree ("component (of product)", in the Bugzilla terminology)?

Here comes an overview of status quo, with the sme/smj relevance indicated.

  • Corpus
    • smj
    • sme
  • Disambiguation
    • sme disambiguation rules
    • smj disambiguation rules
  • Documentation
    • no point in distinguishing between sme and smj here (?), as the lang catalogue is a small part of the documentation anyway; most of the doc catalogue is lg independent.
  • Hyphenation
    • Language division postponed
  • Infrastructure
    • lg-independent
  • Lexicon
    • sme lexicon
    • smj lexicon
  • Morphophonology
    • sme grammar
    • smj grammar
  • Preprocessor
    • The preprocess file is language independent. The files abbr.txt are language dependent.
    • N/A
  • Spell checking
    • We postpone the inner structure of this one, and the question on whether it should be one product or not.
  • Test
    • Largely language independent.
  • Tools (Programs other than the analysers and the preprocessors)
    • Language independent.

Suggestion: bottom level (components). Agreed.

The sme/smj issue will be handled as indicated, by splitting products according to language when required, i.e. mainly in the linguistic products.

Language(s) of RSS Newsfeed

Forrest can generate RSS news for us, which we will use to broadcast project events. What should the language be? What should the content be, ie the type of events that we publish this way?

  • Language
    • Interested persons and institutions: e.g. Audun Lona, project board, Skuvlalinux, all beta testers
    • If monolingual: nob or sme? If multilingual
      • sme for linguistic issues, sme and nob for other issues
  • Content Type
    • Major updates to the docs
    • test releases (Alpha, Beta, etc.)
    • ...

To be decided by next meeting at latest. We (Børre/Sjur) need to find out what is possible in Forrest.

9. Summary, task list


  • The release.
    • authoring, translations, techn. setup


  • Coordinate the release and the tasks ahead of the release.
  • Tomcat/Divvun setup
  • Authoring "How we make it"
  • i18n issues in Forrest
  • Translate Divvun docs to English (with Sjur)


    • Proofread sme translation
  • Linguistics
    • The catch 20 list for the lg board
    • in CVS under doc/lang/sme/norm_issues.xml


  • Author pages for the opening
  • Fix bugs, prepare that opening
  • Translate Divvun docs to English (with Børre)


  • Work with lule sami
  • buy books - dictonaries and grammars to Romssa kántuvra
  • Write documentation
  • smj, to be proofread later by e.g. Kåre Tjikkom


  • Common Makefile for all the languages
  • catxml language removal
  • Look into shortening of three-part compounds, middle part
  • decapitalisation of proper nouns when (if?) compounded, and when derived (the oslolaš issue)


  • release
    • Documentation: Work with Børre on the disamb documentation.
    • Write the sma omission text, perhaps look at the translation to English
    • Technical documentation: Read through, look at omissions
  • Linguistic work
    • Back up Thomas on smj.
    • Make a normativity-issues.xml in the doc/lang/sm{ej}/ directories.

10. Next meeting, closing

20.06.2005 10.00

Closed at 12.08