Meeting setup

  • Date: 27.06.2005
  • Time: 09.00 Norw. time
  • Place: Wherever we are : -)
  • Tools: Phone, iChat, SubEthaEdit


  1. Opening, agenda review
  2. Reviewing the task list from a week ago
  3. Documentation -
  4. Corpus gathering
  5. Corpus infrastructure
  6. Linguistics
  8. Other issues
    1. Vacation planning
  9. Summary, task lists
  10. Closing

1. Opening, agenda review, participants

Opened at 09.30. Agenda accepted as is.

Present: Maaren, Sjur, Thomas, Trond, Tomi

Absent: Børre

Main secretary: Sjur

2. Reviewing the task list from the last meeting


  • The release - until Wednesday: -)


  • Coordinate the release and the tasks ahead of the release.
    • Done
  • Vacation thereafter.
    • "Doing"


    • Done
  • in Tysfjord till Thursday, will work on the project Thursday and Friday
    • the press represented by 2, Sámeradio + Nordsalten Avis
      • the presentation and opening went well
      • Karen Monica Paulsen presented
      • ... as well as Rolf Olsen
      • Berit Karen Paulsen demonstrated the web site
      • Lule Sámi people very happy to finally see Lule Sámi words included in the term db
    • other items covered later in the meeting


  • Author press release for for the opening
    • Done
  • bugs and fixes
    • Done 17, 18 open
  • Other things done:
    • updated cvs repo for to the released version
    • tagged cvs repos for and with the release tag DIVVUN-NO_1_0_RELEASE
    • Wrote monthly report (see your local forrest after cvs up)


  • translate the rest of sme version of sma document (partly translated)
    • done
  • translate press release into Northern and Lule sami
    • done
  • bug Kåre Tjihkkom with the last QA on smj translations
    • Kåre hasn´t yet sent timetable
  • Work with Lule Sami
    • worked
  • call Pål Hivand about when he's going to send out the press release(s)
    • done


  • Common Makefile for all the languages
    • still problems
  • catxml default language issue
    • haven't looked at it
  • three-part compounding
    • haven't succeeded to get it working
  • decapitalisation of proper nouns when (if?) compounded, and when derived (the oslolaš issue)
    • haven't done
  • corrected word2xml script


  • release
    • Mostly worked on this release. Translated common documents and updated some of the documentation.
  • sma-background completion
    • Done.
  • Translate the Hki legal text into Norwegian
    • Done
  • Fix the reported Lule Sámi bugs before Tysfjord.
    • Only partly done. What still does not work is a certain rule that is meant to take care of optional vowel lowering in the Px paradigm, the issue at hand is the bahaviour of the twolc => operator.
    • Apart from that I have looked at the corpus interface, and at corpus translation routines.
    • Then I had Wednesday off to sit in an exam commision.

3. Documentation -

Opening - short evaluation

Not mentioned much in Tysfjord, but the address was given: "go there and have a look". We delivered what we planned, with the exception of Forrest i18n (because of Forrest).

What's next

  • cvs up in place, for nightly updates of the online documentation.
    • using a special, read-only account, for security reasons (the update script will probably have to have the password written in clear)
    • Tomi will have a look, using the cgi-update scripts for the Giellatekno site as a template, and talk to Thor Øivind on this issue.
  • i18n:
    • wait for Børre

4. Corpus gathering

Anders Kintel is now ready with the dictionary manuscripts, and handed it over to the Sámi Lanuage board and GIO by Harrieth Aira and Julie Eira (OAO) at the language board meeting in Tysfjord. The manuscripts will be checked by dthe September meeting.

Kåre Tjihkkom and Kaia Kalstad will now go over the Norwegian - Lule Sámi part, and the dictionaries will be up to control for the September meeting of the lg council. Afterwards, we may get a copy. Factually, we could get it already now, since the Sámi-Norwegian part is already checked. The dictionary is written in Filemaker Pro, and can be exported to other structured formats. Børre will be back at some point next week, and may then ask for a copy.

Thomas has talked to Hans Ragnar Mahtisen about parallel placenames in his Sámi Atlas. These are names of foreign countries. Hans Ragnar positive.

Hans Ragnar already gave us the Sámi placenames, and they are included already, but not as parallel names. Thus, if we want the Tyskland > Duiskka correction function, we need the names.

Thomas will continue to discuss, and emphasize that we also would like to have the parallel names (Tyskland as well as Duiskka).

Trond will get a new version of the NT.

5. Corpus infrastructure

Storing the corpus.dtd and similar files. Now in gt/dtd/

CVS  cwb  doc  dtd  script  sma  sme  smi  smj  smn  sms  tmp  www

The structure of gt and the way to handle dtd must be discussed and decided upon. The discussion is moved to the newsgroup.

6. Linguistics


Nothing done last week. The missing list contains a lot of compounds - compounding isn't working anymore (it worked earlier?). A lot of noise words are Lule and South Sámi words - we need proper language identification as part of the corpus infrastructure, to make use Tomi's language extraction feature of catxml to only get the relevant language.

54 jïh 47 åarjelsaemien 20 Sámedigge 20 Saemiedigkie 20 julevsáme 17 Sámelága 15 finihtta 14 mij 14 dajvesne 13 julevsámegiel 12 aaj 12 • 11 sæjhta 11 dejtie 10 vuolitásahus 10 gïelem 10 ájggu 9 vihkeles

Bugzilla: The -heddjiid issue should be looked into after the giellalávdegoddi has had their meeting. The -eabbo vs. -eabbu issue could be looked into now, because -eabbo has to be implemented anyway.


  • Thomas working with adjectives.
  • Trond working with bugs.


Opening - short evaluation

Opening went well, with the exception of the "sticky" Lule Sámi text.

What's next

  • more bug fixing
  • a couple of critical features still needs to be added

8. Other issues

Language(s) of RSS Newsfeed

It is working (needs a separate input module), so the question is only how it should be used Open for discussion (when Børre is back).

Vacation planning

  • Børre: 20.06-04.07
  • Maaren: 11.07-14.08
  • Thomas: 11.07-14.08
  • Tomi: next week on vacation, 4.7-10.7
  • Trond: 4.7-10.7 vacation. 1.8.-14.8 vacation. In between: More or less vacation (In Finland, working whenever possible).
  • Sjur: probably 25.07-08.08.

9. Summary, task list

This week's task list is a summary of all undone tasks from before the openings.


  • The release - until Wednesday: -)


  • Vacation
  • Continue http: // conversion to forrest ...
  • Contact Svenska bibelsällskapet
  • Add issue to forrest issue tracker about utf-8 ihtml documents.
    • Check for existing bug reports, it might already be in there
  • Continue forrest i18n research
    • Sometimes abides the =locale=se/smj/fi in the url (the document gets correctly selected). The menus and tabs are translated to the language that the browser asks for.
  • add RSS newsfeed support (check with Sjur/
  • discuss with Anders Kintel about whether we could get the Sámi-Norwegian dictionary ahead of acceptance


  • The missing list, both the overall missing list from our xml corpus, and a file-for-file review, in order to get different terminology
  • Go through the missing list from


  • Ask Anne-Britt to update the contract with the University (we are getting 3, not 2, persons there from the fall)
  • bugs and fixes
  • Give Maaren (or: add to the gt/sme/corp corpus) a list of the words
  • write the LIST option to our test tools, as requested in the discussion group
  • complete the action summary after our half-year evaluation
  • contact Kimmo Koskenniemi about kwic-snt and UTF-8 bug
  • voice group-chat not working to Sámediggi
  • Maaren has problems with SubEthaEdit (can't connect)
  • add i18n bug to Forrest issue tracker
  • add i18n menu as feature request to Forrest issue tracker


  • wait for proofreading of the last QA on smj translations
  • write to Hans Ragnar Mahtisen about the parallel placenames
  • work with Lule Sami adjectives


  • Common Makefile for all the languages
  • catxml default language issue
  • three-part compounding
  • decapitalisation of proper nouns when (if?) compounded, and when derived (the oslolaš issue)
  • cvs update script for divvun documentation
  • corpus infrastructure: dtd location (both public and internal)


  • Work on the bug list (Lule Sámi).
  • Work on compounds (both regular and three-part, with Tomi)
  • Work on the corpus interface (with Lars)
  • Corpus infrastructure: dtd location
  • Get the new version of the New Testament
  • Get back to Xerox on the commercial license.
  • Check Hans-Ragnars names.
  • Give Thomas a telephone, and a crash-course in setting up phone meetings

10. Next meeting, closing

04.07.2005 10: 00

Closed at 11.00