Meeting_2005-05-30
Meeting setup
- Date: 30.05.2005
- Time: 10.00 Norw. time
- Place: Wherever we are : -)
- Tools: Phone, iChat, SubEthaEdit
Agenda
- Opening, agenda review
- Reviewing the task list from a week ago
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Linguistics
- Term db
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 10.30. Agenda accepted as is.
Present: Sjur, Thomas, Tomi, Trond, Børre
Away: Maaren
Main secretary: Sjur
2. Reviewing the task list from the last meeting
-
Børre:
- Finish the divvun e-mail alias
- Leif Åge has added Trond
- Trond will test it himself.
- Leif Åge has added Trond
- Contact people about Lule Sámí texts
- Nothing done
- Nothing done
- Contact skolelinux and Knut Hofland about webcrawler.
- Here is Knut's thoughts on the issue in 1999:
- Nothing done
- Nothing done
- Finish the setup of divvun.no
- Børre has now got an account on the server, and will try today to set up Tomcat
- Børre has now got an account on the server, and will try today to set up Tomcat
- Look at how forrest handles i18n, try to get work done on that.
- Done some brief work
- Børre, Tomi and Sjur did quite some work in Guovdageaidnu, found some problems
- Done some brief work
- Contact Roy Dragseth about cvs-commit mailing
- Sent an e-mail, phoned him a few days later. He is away untill the 26th of May.
- Børre will contact him today, or cooperate with Sjur or Tomi today to try it out.
- Trond will ask Roy to give Børre su(per) power.
- Sent an e-mail, phoned him a few days later. He is away untill the 26th of May.
- Finish the setup of http://giellatekno.uit.no/
- Some pages are ready
- Encoding problems with the form pages (analyzer, generator)
- Not yet evaluated what should be included in the new site or not
- Some pages are ready
- Finish the divvun e-mail alias
-
Tomi:
- Install "Tiger" (10.4), document all issues, together with steps to resolve
- Postponed to Kautokeino
- Installation went fine, all have Tiger now
- Postponed to Kautokeino
- Look at the twolc part of how to handle shortening of the middle part in
- Nothing done, even in Guovdageaidnu
- Nothing done, even in Guovdageaidnu
- Look at decapitalisation of proper nouns when compounded or derived to a
- Nothing done, as above.
- Install "Tiger" (10.4), document all issues, together with steps to resolve
-
Sjur:
- Discuss with Kimmo:
- Bring up the kwic-snt issue with Kimmo, make it utf-8 compatible (now,
- Bring up the kwic-snt issue with Kimmo, make it utf-8 compatible (now,
- Still some work on the Termdb
- Fix bugs that are coming up
- Fix bugs that are coming up
- Decide upon the public opening of divvun.no with Anne Britt
- Giellaráđđi will send out a press release together with the opening of risten.no
- Giellaráđđi will send out a press release together with the opening of risten.no
- Ask Leif Åge to check the opening/forwarding of ports number needed for
- We believed it was working (one step forward: single voice and video chats are
- We believed it was working (one step forward: single voice and video chats are
- Discuss with Kimmo:
-
Maaren:
- continue with proper nouns, work on the Bugzilla bug list
- Translate divvun.no front page to Finnish
- done?
- continue with proper nouns, work on the Bugzilla bug list
-
Thomas:
- continue with verb valency
- continue with verb valency
-
Trond:
- Work with the bug list, the lexicon, disambiguation. Prepare Lule Sámi etc.
- Made a plan for Lule Sámi, did not go into the linguistic details, will wait
- Made a plan for Lule Sámi, did not go into the linguistic details, will wait
- Work with the bug list, the lexicon, disambiguation. Prepare Lule Sámi etc.
-
All:
- Have a look at the bug list in Bugzilla (mainly Trond's bugs, but give
- We are now down at 14 open bugs.
- Have a look at the bug list in Bugzilla (mainly Trond's bugs, but give
3. Documentation - divvun.no
- Public opening will be 22.6.2005
- To be done before that:
- Technical issues:
- make sure Tomcat is working
- make sure cvs updating is working
- have i18n in place
- make sure Tomcat is working
- Content of the common (internal) documentation
- Go through the linguistic documentation, tidy up as many loose ends as possible.
- Ditto for the technical parts
- Go through the linguistic documentation, tidy up as many loose ends as possible.
- User/public oriented documentation
- Plan how much there should be, what we should include
- Write it, translate it.
- Overall project plan - should this be translated?
- Official budget - translation?
- More user oriented documentation: how to give feedback, deliver texts, more?
- Plan how much there should be, what we should include
- Language policy
- Our site needs a language policy: What content should be in what languages?
- Our language policy should be documented on the home page.
- Project goal: keep the costs and workload down, but the information value up
- Language policy of the disamb site: Everything in eng and sme, except the sma and smj
- We continue the discussion of the language policy in the newsgroup
- Our site needs a language policy: What content should be in what languages?
- Moral, method, whatever re documentation:
- Document what you do, when you do it.
- Technical issues:
4. Corpus gathering
Børre has been in contact with Áššu, and they will provide us with both correct and uncorrect
Børre and Sjur discussed the legal infrastructure on the way back home on Friday. Among other
Samediggi ---------- Data central (collector) 2 (UiT) | \ | 1 | 4\ |3 | \ | Author/ User / researcher Publisher (end user) (text owners)
The Oslo model: Compared to the above, the collector and the data central is one and
Other differences:
- Oslo:
- remove some parts of the texts, to make it impossible to duplicate whole works
- a simpler, one-page contract, less "legaleese" to understand
- remove some parts of the texts, to make it impossible to duplicate whole works
- Helsinki:
- license whole texts
- a more thorough license text, much more "legaleese", but also a clearer division
- easier to regulate the use of the texts
- license whole texts
We will go for a variant of the Helsinki model.
Actions:
- we have received contract 1 in Finnish, Sjur will scan it and give it to
- the rough translation should be given to the Sámediggi lawyer, who will reformulate
- do we need to have official translations of the contract in all relevant languages?
- Sjur will get copies of 2-4 from Kimmo Koskenniemi or CSC
- licenses 2 and 3 can wait, 1 is important now for the text collecting
5. Corpus infrastructure
We did some testing in Guovdageaidnu - using the corpus infra is quite easy, some points of
Feature requests can be sent to newsgroup. Possible changes to be evaluated:
- include titles as default?
- include lists as default?
- remove text in other languages than the one you are working on? Should be
Seen from a linguistic part of view, we need as much text as possible. For some
6. Linguistics
Normative questions to be clarified:
- eastern / western forms to be included / excluded from the norm as in the proofing tools
- what separator to use between an abbreviation and its case endings when inflected:
- confirmation on who/where/when if so (and a reference to be incl. in our docu); or
- an official decision from a normative body
- Thomas will check the situation
- Thomas will check the situation
- confirmation on who/where/when if so (and a reference to be incl. in our docu); or
- more issues?
- Vocabulary: Are there words that should not be accepted by the speller as Sámi?
- Loan words: Are there procedures for how to spell loan words (sjarmeret, pensjonista)
- Foreign place names: how should they be spelled
- Kárášjoht nisu. Truncated 3-part compounds written as two words. Can they be
- Cf. the Bugzilla base..
- Vocabulary: Are there words that should not be accepted by the speller as Sámi?
- We should bring all outstanding issues to the giellaráđđi meeting in June. Maaren
- The deadline for adding issues to the Giellaráđđi meeting is next Tuesday, so we
7. Term db
Bug fixing, cosmetic tuning.
8. Other issues
8.1 Cgi-bin scripts
Pekka tried to use our form-based tools, which didn't work, and the feedback address
8.2 Trond about the faculty:
- The faculty will give us 2 places in a reading-room, with Macs and net access,
- There were no new office, but they will re-furniture one of the existing ones,
- To be able to get more offices, Trond might need to get outside the faculty
- The contract with the University will have to be reformulated/renegotiated to
- Most likely a new office will be in the same or a nearby building
9. Summary, task list
Børre:
- Disamb documentation with Trond.
- Contact Thor Øivind
- Add corpus texts that are unproblematic from a juridical point of view
- Contact people about Lule Sámí texts
- Contact skolelinux and Knut Hofland about webcrawler.
- Finish the setup of divvun.no
- Contact Roy Dragseth about cvs-commit mailing
- continue Forrest i18n
- continue http://giellatekno.uit.no, encoding problems
Maaren:
- continue missing list
- find and prepare issues for the language board meeting
- translate Helsinki contract into rough Norwegian; send it to Sjur,
Sjur:
- Scan and OCR the Helsinki contract; mail it to Maaren
- Ask Anne-Britt to update the contract with the University (we are getting 3, not
- write the LIST option to our test tools, as requested in the discussion group.
- complete the action summary after our half-year evaluation
- fix risten.no bugs
- contact Kimmo Koskenniemi about kwic-snt and UTF-8 bug
- voice group-chat not working to Sámediggi - should be fixed
Thomas:
- Find issues to giellaráđđi čoahkkin
- help Maaren prepare those same issues
- Continue with the verbs
Tomi:
- Start looking at the smj infrastructure, turn it into utf-8, look at whether updates
- update catxml to remove unwanted language
- Translate the texts in the sme/corp/correct/ directory to utf-8.
- Work on the nonrec transducer: Get the nonrec files out of src/ and into int/, get the
- Look into shortening of three-part compounds, middle part
- decapitalisation of proper nouns when (if?) compounded, and when derived
Trond:
- Disamb documentation with Børre.
- Systematize the linguistic issues ahead of the Language Board meeting
- Add cgi-bin errors to Bugzilla.
- continue the office disc. with the faculty
- Linguistic work.
- computational linguistics, with Tomi.
All:
- The www.divvun.no release.
- The Bugzilla bugs, and a quip or two?
- continue the discussion on the language policy of divvun.no in the newsgroup
10. Next meeting, closing
06.06.2005 10.00
Closed at 12.52