Meeting_2005-09-19
Meeting setup
- Date: 19.09.2005
- Time: 10.00 Norw. time
- Place: Wherever we are : -)
- Tools: iChat, SubEthaEdit
Agenda
- Opening, agenda review
- Reviewing the task list from two weeks ago
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Linguistics
- Speller infrastructure
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 10: 03.
Present: Børre, Maaren, Saara, Sjur, Thomas, Tomi
Absent: Trond
Main secretary: Sjur
Agenda accepted as is.
2. Reviewing the task list from the last meeting
Børre
- Finish crontab specification for the cvs update/export script Tomi made
- still working
- still working
- reopen the jspwiki + UTF-8 issue
- Done
- Done
- Add issue to forrest issue tracker about utf-8 ihtml documents.
- Done
- Done
- Contact Svenska bibelsällskapet
- Done, but haven't managed to get contact with Olavi Korhonen who has the
- Done, but haven't managed to get contact with Olavi Korhonen who has the
- discuss with Anders Kintel about possible cooperation
- Not done
- Not done
- Follow up on CVS mailing:
- set up Maaren
- Will do it this week, as I am in Kautokeino
- Will do it this week, as I am in Kautokeino
- set up Maaren
- Meet up with Trond about directory structure
- Not done
- Not done
- Contact oahpahusossodat and the rest of the SD about texts
- Will do it this week
- Will do it this week
- Fixing the machine for the new coworker
- Done
- Done
- Document the corpus infrastructure
- Not done
- Not done
- Read through the Helsinki contracts (new translations)
- Not done
- Not done
- Reorganise the directory structure
- Not done
- Not done
- Continue converting text from input format to our xml
- Not done
Maaren
- The missing list, both the overall missing list from our xml corpus, and a
- shall do it next week
- shall do it next week
- shall get mainly through the missing list from risten.no this week
- working hard this week also
- working hard this week also
- Start working on grammatical issues with Thomas and Trond
- not done
- not done
- Work on the name project with Trond and Maaren
- not done
- not done
- Start looking at normativity issues
- not done it yet
- not done it yet
- Work on the numerals project with Trond
- not done it yet
Saara
- Look at the corpus infrastructure issue
- Working on this
- Working on this
- Look at the corpus interface issue with Lars
- Not done
- Not done
- Convert texts from .doc to .xml, to get a grasp of our corpus format
- Not done
- Not done
- Have a look at the pdf-to-xml issue (known problem: Keep the Sámi
- Testing with some tools, e.g. pdftohtml, pdftotext. The Sámi
- Testing with some tools, e.g. pdftohtml, pdftotext. The Sámi
Sjur
- risten.no bugs and fixes
- nothing done this week
- nothing done this week
- complete the action summary after our half-year evaluation
- follow up on:
- voice group-chat not working across the Sámediggi firewall
- Now awaiting cost evaluation from the IT guys (Geir Kaaby et al)
- Now awaiting cost evaluation from the IT guys (Geir Kaaby et al)
- voice group-chat not working across the Sámediggi firewall
- To the board:
- write draft specification for the outsourced tasks
- looking for software to integrate the specifiations with our regular
- looking for software to integrate the specifiations with our regular
- write half-yearly project report with progress and bugdet status
- continued, not finished
- continued, not finished
- Deadline for the board tasks: 3 weeks ahead of the meeting (the meeting is
- write draft specification for the outsourced tasks
- project planning with Trond
- not done, but have been looking for software, see below
- not done, but have been looking for software, see below
- Work on the name project with Trond and Maaren
- nothing done
- nothing done
- Prepare for a Lule Sámi meeting with Árran
- nothing done
- nothing done
- Follow up on place names from Norge Digitalt
- waiting for further response from Øystein Johannessen
- waiting for further response from Øystein Johannessen
- Read through the Helsinki contracts (new translations)
- browsed through them
- added them to the menus, and rewrote the legal section text
- browsed through them
- Talk to Bitte about the Lule Sámi lexicon
- Bitte has been travelling
- Bitte has been travelling
- Evaluate SFST as speller (and analyzer) lexicon
- After project board meeting
- After project board meeting
- Other tasks:
- looked into project management software - there's a new
page for it
- also looked for software to help with writing and following requirements
- looked into project management software - there's a new
page for it
Thomas
- work on Lule Sami compounding and derivation
- worked hard and still working
- worked hard and still working
- Look at Linguistic bugs with Trond.
- solved some, some others are left
- solved some, some others are left
- Prepare for a Lule Sámi meeting with Árran
Tomi
- Aspell: Continue working on the affix file & aspell
- Contact aspell author (UTF-8 thing)
- Not done
- Not done
- Contact aspell author (UTF-8 thing)
- three-part compounding
- Not done
- Not done
- corpus infrastructure: dtd location (both public and internal)
- Not done
- Not done
- corpus infrastructure: file and dir organisation
- Not done
- Not done
- Document aspell and corpus infrastructure
- Done, partly
- Done, partly
- Add html-to-xml conversion to corpus infra
- Not done
Trond
- ( Trond will be absent at next week's meeting, or perhaps
- Work on the bug list.
- Work on compounds (three-part, with Tomi)
- Work on the corpus interface (with Lars and Saara)
- Work on the name agreement with "Norge digitalt" with Thomas
- Look at the linguistic aspects of the speller clitics, with
- Get the new version of the New Testament
- Introduce the new coworker to the work routines
- project planning with Sjur
- Work on the name project with Maaren and Sjur
- Prepare for a Lule Sámi meeting with Árran
- Work on the numerals project with Maaren
3. Documentation
Documentation tasks:
- Add documentation on our corpus infrastructure and our corpus work in general
- add/update Aspell documentation (Tomi)
- finish divvun2web script (Børre)
- as always: document what you're doing: -) (all)
- divvun.no is turning white from time to time. Needs to be checked (Børre)
- This was probably too little memory allocated to java (the default is
- This was probably too little memory allocated to java (the default is
4. Corpus gathering
See notes from the 12.9. meeting
Tasks:
- read through Trond's translations (Børre, Sjur)
- e-mail Kimmo Koskenniemi about the missing fourth contract, and about
- send the license text to lawyers
- add a background document explaining the model
5. Corpus infrastructure
Naming conventions and directory structure
See notes from the 12.9.
meeting
Corpus conversion
Pdf to XML
Extraction priority list
- retain correct Sámi characters
- retain word and sentence order
- retain paragraph order
- retain structure
- paragraphs
- titles, headers
- metadata (author, year, etc.)
- lists
- tables
- paragraphs
Problems found so far using open-source tools:
- text gets correctly out, also regarding encoding
- paragraphs are correctly ordered, but not separated (i.e. one long paragraph)
- no structure
HTML to XML
- we already have some tools according to Saara
- this is anyway easy, as HTML provides us with the structure we need
- what is needed is a transformation to our XML, + adding the metadata as usual
- it can wait at least a week or two (after pdf conversion is mostly done)
6. Linguistics
Test
Name lexicon
See notes from the 12.9. meeting
North Sámi
- three-part compounds issue still open
-
Trond, Maaren, Sjur will look into this in Guovdageaidnu
-
Trond, Maaren, Sjur will look into this in Guovdageaidnu
-
Johnny Andersen has written a letter to us on the treatment of Sámi place
-
Sjur has written an e-mail to the UFD contact person,
-
Sjur has written an e-mail to the UFD contact person,
- normativity issues:
- the Giellalávdegoddi meeting is in October sometime
Lule Sámi
- we need a lexicon
- compounding and derivation
-
Thomas has finished with deverbals, now working with denominals
- most likely the same three-part compound problem in Lule Sámi as well
- it is possible that even the first stem shortens the same way
-
Thomas has finished with deverbals, now working with denominals
- Suffix boundary symbol has not been added, we are not sure whether we should
Numerals
- An empirical overview
- Numeral generation
- Numeral inflection
- Numerals as parts of compounds
- Numeral generation
- A clear concept of how we want to treat them
- Tagging
- Tagging
- A treatment
7. Speller infrastructure
Aspell
Write documentation here as well.
The munch-list is working, and the affix file is improving. See 15.8. meeting memo for more.
See 12.9. meeting memo for
8. Other
Technical issues
- The mac os / perl bug (at least Trond and Sjur has it):
- utf8 "\xC4" does not map to Unicode at /Users/trond/gt/script/preprocess line
- Another example:
- : "\x{00c3}" does not map to utf8 at ../script/preprocess line 113, <> chunk
- Another example:
- utf8 "\xC4" does not map to Unicode at /Users/trond/gt/script/preprocess line
-
Sjur has a non-solved Backspace + UTF-8 issue
- This issue is now fixed! (I believe I had to specify a locale that was
- This issue is now fixed! (I believe I had to specify a locale that was
- CVS mailing to Maaren:
- it is working, but she receives two copies of every message. I don't know
- it is working, but she receives two copies of every message. I don't know
Bug fixing
- 13 open bugs (and 24 risten.no bugs) - it seems Sjur can need some help
- some bugs are not real, but just not closed yet even though they are fixed.
-
Børre to ask Thor Øyvind to configure Bugzilla to send e-mail
-
Børre to ask Thor Øyvind to configure Bugzilla to send e-mail
Memo and meeting practice update
From now on, next week's memo frame with task lists etc. will be made available
9. Summary, task list
Børre
- Finish crontab specification for the cvs update/export script Tomi made
- Contact Svenska bibelsällskapet / Olavi Korhonen
- discuss with Anders Kintel about possible cooperation
- Follow up on CVS mailing:
- Have a look at why Maaren and Thomas get two copies of every samicvs
- Have a look at why Maaren and Thomas get two copies of every samicvs
- Contact oahpahusossodat and the rest of the SD about texts
- Document the corpus infrastructure
- Read through the Helsinki contracts (new translations)
- Reorganise the directory structure
- Continue converting text from input format to our xml
Maaren
- The missing list, both the overall missing list from our xml corpus, and a
- shall do it next week
- shall do it next week
- shall get mainly through the missing list from risten.no this week
- working with risten.no this week also
- working with risten.no this week also
- Start working on grammatical issues with Thomas and Trond
- shall do it this week or next week?
- shall do it this week or next week?
- Work on the name project with Trond and Sjur
- okei okei
- okei okei
- Start looking at normativity issues
- shall do it this week
- shall do it this week
- Work on the numerals project with Trond
- shall contact Trond
Saara
- Look at the corpus infrastructure issue
- Look at the corpus interface issue with Lars
- Convert texts from .doc to .xml, to get a grasp of our corpus format
- Have a look at the pdf-to-xml issue
- use the priority list earlier in the memo for a guidance
Sjur
- risten.no bugs and fixes
- complete the action summary after our half-year evaluation
- follow up on:
- voice group-chat not working to Sámediggi
- Now awaiting cost evaluation from the IT guys (Geir Kaaby et al)
- Now awaiting cost evaluation from the IT guys (Geir Kaaby et al)
- voice group-chat not working to Sámediggi
- To the board:
- write draft specification for the outsourced tasks
- write half-yearly project report with progress and bugdet status
- Deadline for the board tasks: 3 weeks ahead of the meeting (the meeting is
- write draft specification for the outsourced tasks
- project planning with Trond
- Work on the name project with Trond and Maaren
- Prepare for a Lule Sámi meeting with Árran
- Follow up on place names from Norge Digitalt
- Read through the Helsinki contracts (new translations)
- Talk to Bitte about the Lule Sámi lexicon
- Evaluate SFST as speller (and analyzer) lexicon
- prepare for the Guovdageaidnu meeting:
- name lexicon
- three-part compounds
- name lexicon
- e-mail Kimmo Koskenniemi about contract issues
Thomas
- work on Lule Sami compounding and derivation
- Look at Linguistic bugs with Trond.
- Prepare for a Lule Sámi meeting with Árran
Tomi
- Aspell: Continue working on the affix file & aspell
- Contact aspell author (UTF-8 thing)
- Contact aspell author (UTF-8 thing)
- three-part compounding
- corpus infrastructure: dtd location (both public and internal)
- corpus infrastructure: file and dir organisation
- Document aspell and corpus infrastructure
- Add html-to-xml conversion to corpus infra
Trond
- ( Trond will be absent at next week's meeting, or perhaps
- Work on the bug list.
- Work on compounds (three-part, with Tomi)
- Work on the corpus interface (with Lars and Saara)
- Work on the name agreement with "Norge digitalt" with Thomas
- Look at the linguistic aspects of the speller clitics, with
- Get the new version of the New Testament
- Introduce the new coworker to the work routines
- project planning with Sjur
- Work on the name project with Maaren and Sjur
- Prepare for a Lule Sámi meeting with Árran
- Work on the numerals project with Maaren
- Prepare for three-part compounds meeting in Guovdageaidnu
10. Next meeting, closing
26.09.2005 10: 00
Closed at 11: 10