Meeting_2005-05-10
Meeting setup
- Date: 10.05.2005
- Time: 10.00 Norw. time
- Place: Wherever we are : -)
- Tools: Phone, iChat, SubEthaEdit
Agenda
- Opening, agenda review
- Reviewing the task list from a week ago
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Linguistics
- Term db
- Other issues
- SEE License number
- Sjur going to Joensuu
- Next week + gathering
- SEE License number
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 10.06. Agenda accepted as is.
Present: Maaren, Sjur, Thomas, Tomi, Trond, Børre
Main secretary: Tomi
2. Reviewing the task list from the last meeting
-
Thomas, Maaren, Børre: Translate divvun front page
- Børre has made template pages and Thomas has done his part of the work
- Børre has made template pages and Thomas has done his part of the work
-
Trond, Tomi, Børre, Sjur: Continue with corpus infrastructure
- Not much has happened since our breakthrough.
- Scripts for word > xml (docbook), xml > gtxml (db to our xml)
- Template for manual additions and script for feeding xml to pp are still
missing
- Not much has happened since our breakthrough.
-
Trond:
- Will find out who can help us with corpus agreement contracts
in Oslo when Ruth Vadtvedt Fjeld is away.- The Bokmål part of Leksikografisk institutt at hf in Oslo hired two persons
to collect texts, cf. a presentation at http://www.hf.uio.no/iln/for-ansatte/hf-aktuelt/ny-korpus.html - Forskingsassistent Cecilie Hauglund samlar inn tekstar, Programmerar
Elisabeth Lien programmerer og handsamar dei vidare.
- The Bokmål part of Leksikografisk institutt at hf in Oslo hired two persons
- Work with Børre on the giellatekno
- Discussion + some work
- Discussion + some work
- Will find out who can help us with corpus agreement contracts
-
Sjur:
- Call Kimmo about license text & e-mail from Børre
- Tried last week, no contact, will try again this week, and have a meeting on
Friday if possible - Also discuss kwicsnt & UTF-8 problem
- Tried last week, no contact, will try again this week, and have a meeting on
- More termdb work
- Editing part is completed
- Editing part is completed
- Contact Anne Britt about divvun.no and the Språkstyremøte.
- She suggested that the Sámi Parliament Council sent out a press release
rather than Språkstyret
- She suggested that the Sámi Parliament Council sent out a press release
- Call Kimmo about license text & e-mail from Børre
-
Børre:
- Ask Kåre Tjikkom, Harriet Aira, Karin Tuolja, Susanna Kuoljok-Angeus and
Samuel Gælok about Lule Sámi text.- Will do.
- Will do.
- Contact Leif Åge adding Trond to divvun e-mail alias.
- Will do.
- Will do.
- Prepare the war file for Tomcat deployment next week
- Done. Thor Øivind Johansen will contact Sjur for details about tomcat
and forrest.
- Done. Thor Øivind Johansen will contact Sjur for details about tomcat
- Work with Trond on the giellatekno pages
- Agreed on what to include, work continues this week
- Agreed on what to include, work continues this week
- Finish the work on the Termdb
- Done. Modified the skin, converted South Sámi grammar to xml (from Word
files)
- Done. Modified the skin, converted South Sámi grammar to xml (from Word
- Contact Skolelinux about their webcrawler, also Knut Hofland
(ask Trond/Sjur)- Will do.
- Will do.
- Ask Kåre Tjikkom, Harriet Aira, Karin Tuolja, Susanna Kuoljok-Angeus and
-
Tomi:
- Decide the directory structure for corpus originals
- Proposal: Keep the structure as we get it from the donor
- Proposal: Keep the structure as we get it from the donor
- Script for antiword and UTF-8 fix -processing
- Done. Still some UTF-8 fixes probably coming..
- Done. Still some UTF-8 fixes probably coming..
- Template for xsl manual conversion script
- Not started yet.
- Not started yet.
- Script for processing the corpus for preprocessor
- Not done yet, should be straightforward kind of task.
- Not done yet, should be straightforward kind of task.
- Decide the directory structure for corpus originals
-
Thomas: continue with verbs and ask Teemu Leskinen for finnish parallel
names.- Reached verb nr 7000, still 5000 verbs left
- Teemu Leskinen: has sent files, with Finnish parallel names in place. Finnish
parallel names not coupled with the Sámi names
- Reached verb nr 7000, still 5000 verbs left
-
Maren:
- Worked on proper nouns, corrected the classification of 1747 names...
- Have a look at the closed classes, especially the indefinite pronouns.
- TBD later
- TBD later
- Work more on lexical issues.
- Done, will continue.
- Worked on proper nouns, corrected the classification of 1747 names...
3. Documentation - divvun.no
Thor Øivind Johansen will contact Sjur about Forrest integration on the
More content as we progress.
4. Corpus gathering
No new responses, but the relevant persons at HF/UiO identified, and will be
5. Corpus infrastructure
Major parts in place. Still missing: gtxml to preprocess, and template for
6. Linguistics
- Could we use transitivity as a tag for disambiguation? Yes.
- What about removing adverbs that are straightforwardly generated from adjectives,
like +A+Adv? Perhaps by doing it
7. Term db
Progressing towards internal/Sámediggi opening.
8. Other issues
8.1 SEE License number
Received and entered.
8.2 Sjur going to Joensuu
Closing meeting for the Nordic Programme for Language technology. Sjur will talk
8.3 Next week + gathering
Next week: No meeting due to holidays and Sjur going to Joensuu
Gathering:
- Programme: what to do
- Programming content:
- Change to 10.4. Open issues within bash (upgrade to 3.00), readline, etc.
- Forrest and i18n
- Make sure video and voice conferencing is working through the Sámediggi firewall
- Change to 10.4. Open issues within bash (upgrade to 3.00), readline, etc.
- Linguistic content:
- Julevsáme: Go through the transducer, plan future work (Thomas, Trond, ...)
- Hard issues: Oslo>oslolas1, Kárás1*joh*las1, case agreement within numerals
- Julevsáme: Go through the transducer, plan future work (Thomas, Trond, ...)
- Project/organisation:
- Evaluate first half year
- Discuss issues/improvements
- Evaluate first half year
- Programming content:
- Practical things:
- Where to stay? Villmarkssenteret
- Who shall order rooms? Maren can do it
- Sjur: Tuesday - Saturday
- Trond: Tuesday - Friday
- Tomi: Tuesday - Saturday
- Børre: Tuesday - Saturday
- Sjur: Tuesday - Saturday
- Schedule. Wednesday, thursday, friday, full days (Trond to leave
somewhat earlier on Friday) and Maren about 11.00. - Reserve the meeting room
- Maaren has already done it a long time ago! Excellent!
- Where to stay? Villmarkssenteret
We will put up the programme and other details on a separate page (we = Børre: -)
8.4 cvs-commit mailing to the team members
CVS sends out an e-mail to all members of a list each time something is commited
All commits are e-mailed to all team members in both projects. Use e-mail
Børre is setting it up (talk to Roy Dragseth), cvs watch command.
CVS documentation at:
9. Summary, task list
TODO:
-
Børre:
- Contact Univ. of Oslo on the contract issue.
- Finish the divvun alias
- Contact people about Lule Sámí texts
- Contact skolelinux and Knut Hofland about webcrawler.
- Here is Knut's thoughts on the issue in 1999:
http://gandalf.aksis.uib.no/knut/mons/kh-avis.htm - Finish the setup of divvun.no
- Look at how forrest handles i18n, try to get work done on that.
- Set up programme and other details about the gathering in Guovdageaidnu
25th-27th of May - Contact Roy Dragseth about cvs-commit mailing
- Finish the setup of http://giellatekno.uit.no/
- Contact Univ. of Oslo on the contract issue.
-
Tomi:
- Template for manual xsl conversion script
- Script for processing the corpus for preprocessor
- Install "Tiger" (10.4), document all issues, together with steps to resolve
them - Look at the twolc part of how to handle shortening of the middle part in
three part compounds. Discuss the data / linguistic side with the linguists. - Look at decapitalisation of proper nouns when compounded or derived to a
general noun
- Template for manual xsl conversion script
-
Sjur:
- Meet with Kimmo:
- Discuss text licensing, and get a copy of the Helsinki licensing model/text
- Bring up the kwic-snt issue with Kimmo, make it utf-8 compatible (now,
each byte counts as one, instead of each character counting as one (what is needed is a counting mechanism that counts only bytes starting in 01 or 110, and ignoring bytes starting in 10.)
- Discuss text licensing, and get a copy of the Helsinki licensing model/text
- Still some work on the Termdb
- Write presentation for the Joensuu meeting
- Decide upon the public opening of divvun.no with Anne Britt
- Discuss with Thor Øivind Johansen about Forrest/divvun/Tomcat.
Tel: +47 7764 6741 - Ask Leif Åge to check the opening/forwarding of ports number needed for
iChat voice and video conferencing
- Meet with Kimmo:
-
Maaren:
- continue with proper nouns, work on the Bugzilla bug list
- reserve rooms at Villmarkssenteret
- Translate divvun.no front page to Finnish
- continue with proper nouns, work on the Bugzilla bug list
-
Thomas:
- continue with verb valency
- Reply to Teemu Leskinen, thanking for the material but asking that we get a
new version, now with the Finnish names coupled with the Sámi names, if at all possible.
- continue with verb valency
-
Trond:
- Work with the bug list, the lexicon, disambiguation. Prepare Lule Sámi etc.
for the meeting.
- Work with the bug list, the lexicon, disambiguation. Prepare Lule Sámi etc.
-
All:
- Have a look at the bug list in Bugzilla (mainly Trond's bugs, but give
me a hand, will you?) http://giellatekno.uit.no/bugzilla/ - Prepare the Guovdageaidnu meeting
- Order tickets to the Guovdageaidnu meeting
- Have a look at the bug list in Bugzilla (mainly Trond's bugs, but give
10. Next meeting, closing
23.05.2005 10.00
Closed at 12.00

