Meeting_2005-05-10

Meeting setup

  • Date: 10.05.2005
  • Time: 10.00 Norw. time
  • Place: Wherever we are : -)
  • Tools: Phone, iChat, SubEthaEdit

Agenda

  1. Opening, agenda review
  2. Reviewing the task list from a week ago
  3. Documentation - divvun.no
  4. Corpus gathering
  5. Corpus infrastructure
  6. Linguistics
  7. Term db
  8. Other issues
    1. SEE License number
    2. Sjur going to Joensuu
    3. Next week + gathering
  9. Summary, task lists
  10. Closing

1. Opening, agenda review, participants

Opened at 10.06. Agenda accepted as is.

Present: Maaren, Sjur, Thomas, Tomi, Trond, Børre

Main secretary: Tomi

2. Reviewing the task list from the last meeting

  • Thomas, Maaren, Børre: Translate divvun front page
    • Børre has made template pages and Thomas has done his part of the work
  • Trond, Tomi, Børre, Sjur: Continue with corpus infrastructure
    • Not much has happened since our breakthrough.
    • Scripts for word > xml (docbook), xml > gtxml (db to our xml)
    • Template for manual additions and script for feeding xml to pp are still missing
  • Trond:
    • Will find out who can help us with corpus agreement contracts in Oslo when Ruth Vadtvedt Fjeld is away.
    • Work with Børre on the giellatekno
      • Discussion + some work
  • Sjur:
    • Call Kimmo about license text & e-mail from Børre
      • Tried last week, no contact, will try again this week, and have a meeting on Friday if possible
      • Also discuss kwicsnt & UTF-8 problem
    • More termdb work
      • Editing part is completed
    • Contact Anne Britt about divvun.no and the Språkstyremøte.
      • She suggested that the Sámi Parliament Council sent out a press release rather than Språkstyret
  • Børre:
    • Ask Kåre Tjikkom, Harriet Aira, Karin Tuolja, Susanna Kuoljok-Angeus and Samuel Gælok about Lule Sámi text.
      • Will do.
    • Contact Leif Åge adding Trond to divvun e-mail alias.
      • Will do.
    • Prepare the war file for Tomcat deployment next week
      • Done. Thor Øivind Johansen will contact Sjur for details about tomcat and forrest.
    • Work with Trond on the giellatekno pages
      • Agreed on what to include, work continues this week
    • Finish the work on the Termdb
      • Done. Modified the skin, converted South Sámi grammar to xml (from Word files)
    • Contact Skolelinux about their webcrawler, also Knut Hofland (ask Trond/Sjur)
      • Will do.
  • Tomi:
    • Decide the directory structure for corpus originals
      • Proposal: Keep the structure as we get it from the donor
    • Script for antiword and UTF-8 fix -processing
      • Done. Still some UTF-8 fixes probably coming..
    • Template for xsl manual conversion script
      • Not started yet.
    • Script for processing the corpus for preprocessor
      • Not done yet, should be straightforward kind of task.
  • Thomas: continue with verbs and ask Teemu Leskinen for finnish parallel names.
    • Reached verb nr 7000, still 5000 verbs left
    • Teemu Leskinen: has sent files, with Finnish parallel names in place. Finnish parallel names not coupled with the Sámi names
  • Maren:
    • Worked on proper nouns, corrected the classification of 1747 names...
    • Have a look at the closed classes, especially the indefinite pronouns.
      • TBD later
    • Work more on lexical issues.
      • Done, will continue.

3. Documentation - divvun.no

Thor Øivind Johansen will contact Sjur about Forrest integration on the server/Tomcat.

More content as we progress.

4. Corpus gathering

No new responses, but the relevant persons at HF/UiO identified, and will be contacted. See above.

5. Corpus infrastructure

Major parts in place. Still missing: gtxml to preprocess, and template for manual additions and script for processing corpus for preprocessor are still missing.

6. Linguistics

  • Could we use transitivity as a tag for disambiguation? Yes.
  • What about removing adverbs that are straightforwardly generated from adjectives, like +A+Adv? Perhaps by doing it

7. Term db

Progressing towards internal/Sámediggi opening.

8. Other issues

8.1 SEE License number

Received and entered.

8.2 Sjur going to Joensuu

Closing meeting for the Nordic Programme for Language technology. Sjur will talk about the SALETEK network. Will be away Wednesday and Thursday, perhaps Friday next week (May 18.-20.).

8.3 Next week + gathering

Next week: No meeting due to holidays and Sjur going to Joensuu

Gathering:

  • Programme: what to do
    • Programming content:
      • Change to 10.4. Open issues within bash (upgrade to 3.00), readline, etc.
      • Forrest and i18n
      • Make sure video and voice conferencing is working through the Sámediggi firewall
    • Linguistic content:
      • Julevsáme: Go through the transducer, plan future work (Thomas, Trond, ...)
      • Hard issues: Oslo>oslolas1, Kárás1*joh*las1, case agreement within numerals
    • Project/organisation:
      • Evaluate first half year
      • Discuss issues/improvements
  • Practical things:
    • Where to stay? Villmarkssenteret
    • Who shall order rooms? Maren can do it
      • Sjur: Tuesday - Saturday
      • Trond: Tuesday - Friday
      • Tomi: Tuesday - Saturday
      • Børre: Tuesday - Saturday
    • Schedule. Wednesday, thursday, friday, full days (Trond to leave somewhat earlier on Friday) and Maren about 11.00.
    • Reserve the meeting room
      • Maaren has already done it a long time ago! Excellent!

We will put up the programme and other details on a separate page (we = Børre: -)

8.4 cvs-commit mailing to the team members

CVS sends out an e-mail to all members of a list each time something is commited (to a certain module like gt)

All commits are e-mailed to all team members in both projects. Use e-mail filtering for uncluttering the inbox.

Børre is setting it up (talk to Roy Dragseth), cvs watch command.

CVS documentation at: file:///Developer/Documentation/DeveloperTools/cvs/cvs_toc.html

9. Summary, task list

TODO:

  • Børre:
    • Contact Univ. of Oslo on the contract issue.
    • Finish the divvun alias
    • Contact people about Lule Sámí texts
    • Contact skolelinux and Knut Hofland about webcrawler.
    • Here is Knut's thoughts on the issue in 1999: http://gandalf.aksis.uib.no/knut/mons/kh-avis.htm
    • Finish the setup of divvun.no
    • Look at how forrest handles i18n, try to get work done on that.
    • Set up programme and other details about the gathering in Guovdageaidnu 25th-27th of May
    • Contact Roy Dragseth about cvs-commit mailing
    • Finish the setup of http://giellatekno.uit.no/
  • Tomi:
    • Template for manual xsl conversion script
    • Script for processing the corpus for preprocessor
    • Install "Tiger" (10.4), document all issues, together with steps to resolve them
    • Look at the twolc part of how to handle shortening of the middle part in three part compounds. Discuss the data / linguistic side with the linguists.
    • Look at decapitalisation of proper nouns when compounded or derived to a general noun
  • Sjur:
    • Meet with Kimmo:
      • Discuss text licensing, and get a copy of the Helsinki licensing model/text
      • Bring up the kwic-snt issue with Kimmo, make it utf-8 compatible (now, each byte counts as one, instead of each character counting as one (what is needed is a counting mechanism that counts only bytes starting in 01 or 110, and ignoring bytes starting in 10.)
    • Still some work on the Termdb
    • Write presentation for the Joensuu meeting
    • Decide upon the public opening of divvun.no with Anne Britt
    • Discuss with Thor Øivind Johansen about Forrest/divvun/Tomcat. Tel: +47 7764 6741
    • Ask Leif Åge to check the opening/forwarding of ports number needed for iChat voice and video conferencing
  • Maaren:
    • continue with proper nouns, work on the Bugzilla bug list
    • reserve rooms at Villmarkssenteret
    • Translate divvun.no front page to Finnish
  • Thomas:
    • continue with verb valency
    • Reply to Teemu Leskinen, thanking for the material but asking that we get a new version, now with the Finnish names coupled with the Sámi names, if at all possible.
  • Trond:
    • Work with the bug list, the lexicon, disambiguation. Prepare Lule Sámi etc. for the meeting.
  • All:
    • Have a look at the bug list in Bugzilla (mainly Trond's bugs, but give me a hand, will you?) http://giellatekno.uit.no/bugzilla/
    • Prepare the Guovdageaidnu meeting
    • Order tickets to the Guovdageaidnu meeting

10. Next meeting, closing

23.05.2005 10.00

Closed at 12.00