Meeting_2005-05-02
Meeting setup
- Date: 02.05.2005
- Time: 10.00 Norw. time
- Place: Wherever we are : -)
- Tools: Phone, iChat, SubEthaEdit
Agenda
- Opening, agenda review
- Reviewing the task list from a week ago
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Linguistics
- Term db
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 10.10. Agenda accepted as is.
Present: Maaren, Sjur, Thomas, Tomi, Trond, Børre
Main secretary: Trond (Sjur)
2. Reviewing the task list from the last meeting
-
All: report vacation plans
-
Trond, Børre, Sjur and Tomi
- A session on the corpus format.
- We had several sessions on the corpus format, and actually came up with
- We had several sessions on the corpus format, and actually came up with
- A session on the corpus format.
-
Børre
- Contact Kimmo Koskenniemi and Ruth Vatvedt Fjeld about contracts
- Sent an e-mail, no respons so far
- Trond: will call around
- Sent an e-mail, no respons so far
- Fix the link problems in the documentation.
- Only a handful of link problems left
- Only a handful of link problems left
- Further discussion with web server administrator about the divvun.no site.
- Monday/Tuesday next week he will do the nexcessary work
- Monday/Tuesday next week he will do the nexcessary work
- Some minor things with termdb.
- Done some of it, some left
- Done some of it, some left
- Review giellatekno.no site suggestion with Trond.
- Did some, found problems with utf-8 .ihtml files.
- Did some, found problems with utf-8 .ihtml files.
- Contact Kimmo Koskenniemi and Ruth Vatvedt Fjeld about contracts
-
Tomi
- continue corpus discussion
- Did so, and updated the conversion script from docbook to our internal format
- corpus location and corpus processing
- Did so, and updated the conversion script from docbook to our internal format
- continue corpus discussion
-
Maaren: tries to work with the missing list. Else?
- Maaren has had courses and seminars at the Sámi parliament the whole week.
- Maaren has had courses and seminars at the Sámi parliament the whole week.
-
Thomas: work with verb transitivity
- Has been working as usual, with verb transitivity (which is much work).
- Has been working as usual, with verb transitivity (which is much work).
-
Sjur:
- ... has been working with corpus format and processing.
- The editing tool of the terminology db is almost completed. The technical
- No work on the text license contract or on divvun.no, awaiting further progress
- ... has been working with corpus format and processing.
3. Documentation - divvun.no
The humfak server will be set up for divvun.no this week, so that the pentalingual
Todo: Translate the current welcome page into 5 lgs, and add some text as well.
We will add more information on the plans and the progress of the project.
Official opening by Språkstyret medio June. Todo: Ask Anne Britt to put it on the
North Sámi as default language. There should be a note explaining that English is
Giellatekno pages will be changed from parallel bilingual text on the same pages
4. Corpus gathering
There is a problem that we have got no answer from Oslo and Helsinki. We need
Northern Sámi texts:
Lule Sámi texts:
We need a web crawler for Sámi text. TODO: Contact Knut Hofland in Bergen on this
5. Corpus infrastructure
We have arrived, or rather are arriving, at the following:
sme/ /orig/<donor>/<year>/file.doc <- dump, __must__ be write protected /int/as-in-orig/file.db.xml <- docbook format /int/as-in-orig/file.xsl <- file-specific scripts, under version control /gt/publisher OR author/year/file.xml <- xmlpreprocess ... | lookup ...
We should put some effort into the donor directory. Donor could be ordered according
We need: The texts
6. Linguistics
Gone through 5200+1500 verbs, more than 13 000 all in all.
Maren tries to work with the missing list
Issues:
- General issue: How should we prioritise between work on names, n-v-a cluster,
- Detailed view:
- Lexical coverage
- Sámi place names (later, when Hønefoss have been transposing more names)
- Names:
- The LONDON/BERN (-is/-as) issue.
- Distinguishing different name types? (Person, place, ...).
- Group parallel names (Guovdageaidnu / Kautokeino(s) / Koutokeino = Koutokeinossa,
- The LONDON/BERN (-is/-as) issue.
- 3-part compounds: sáme giel oahpahus. Needed: Linguistic description + lexc
- east-west differences
- indefinite pronouns (closed-sme-lex.txt file)
- Compounding
- Lexical coverage
A priority policy is to first look at the closed classes.
Linguistic priority list:
- finish the verb transitivity
- closed POS
- compounds
- derivation
- completing the lexicon
- names
7. Term db
Issues left:
- i18n
- l10n
- styling
- graphics
- translations
- more...
New deadline for internal opening: May 13th
8. Other issues
University project:
Will need 1-2 new linguists, for at least one and a half year.
Friday May 6:
- Sjur: taking day off
- Maren: me too
9. Summary, task list
TODO:
-
Thomas, Maaren, Børre: Translate divvun front page
-
Trond, Tomi, Børre, Sjur: Continue with corpus infrastructure
-
Trond:
- Will find out who can help us with corpus agreement contracts
- Work with Børre on the giellatekno
- Will find out who can help us with corpus agreement contracts
-
Sjur:
- Call Kimmo about license text & e-mail from Børre
- More termdb work
- Contact Anne Britt about divvun.no and the Språkstyremøte.
- Call Kimmo about license text & e-mail from Børre
-
Børre:
- Ask Kåre Tjikkom, Harriet Aira, Karin Tuolja, Susanna Kuoljok-Angeus and
- Contact Leif Åge adding Trond to divvun e-mail alias.
- Prepare the war file for Tomcat deployment next week
- Work with Trond on the giellatekno pages
- Finish the work on the Termdb
- Contact Skolelinux about their webcrawler, also Knut Hofland (ask Trond/Sjur)
- Ask Kåre Tjikkom, Harriet Aira, Karin Tuolja, Susanna Kuoljok-Angeus and
-
Tomi:
- Decide the directory structure for corpus originals
- Script for antiword and UTF-8 fix -processing
- Template for xsl manual conversion script
- Script for processing the corpus for preprocessor
- Decide the directory structure for corpus originals
-
Thomas: continue with verbs and ask Teemu Leskinen for finnish parallel names.
-
Maren:
- Have a look at the closed classes, especially the indefinite pronouns.
- Work more on lexical issues.
- Have a look at the closed classes, especially the indefinite pronouns.
10. Next meeting, closing
09.05.2005 10.00
Closed at 12.23.