Meeting_2005-09-05
Contents:
- Meeting setup
- Agenda
- 1. Opening, agenda review, participants
- 2. Reviewing the task list from the last meeting
- 3. Summing up last week (Meeting + conference)
- 4. Documentation
- 5. Corpus gathering
- 6. Corpus infrastructure
- 7. Linguistics
- 8. Speller infrastructure
- 9. Other
- 10. Summary, task list
- 10. Next meeting, closing
Meeting setup
- Date: 05.09.2005
- Time: 10.00 Norw. time
- Place: Wherever we are : -)
- Tools: iChat, SubEthaEdit
Agenda
- Opening, agenda review
- Reviewing the task list from two weeks ago
- Summing up last week (Meeting + conference)
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Linguistics
- Speller infrastructure
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 10: 12.
Present: Børre, Saara (half an hour), Sjur, Thomas, Tomi, Trond
Absent: Maaren
Main secretary: Børre
Agenda accepted with new point 3.
2. Reviewing the task list from the last meeting
Børre
- Add crontab specification for the cvs update/export script Tomi made
- Adjust it so it will work with the new server
- Working on it, almost done
- Working on it, almost done
- Adjust it so it will work with the new server
- reopen the jspwiki + UTF-8 issue
- Not done
- Not done
- Add issue to forrest issue tracker about utf-8 ihtml documents.
- Not done
- Not done
- Contact Svenska bibelsällskapet
- Not done
- Not done
- discuss with Anders Kintel about possible cooperation
- Not done, been in Helsinki
- Not done, been in Helsinki
- Continue writing a contract draft, together with Trond and Sjur.
- Got new documents
- Got new documents
- Follow up on CVS mailing:
- check that Thomas and Maaren receive forwarded mail from cochice
- set up Thomas and Maaren
- Thomas is set up, not sure about Maaren
- Thomas is set up, not sure about Maaren
- check that Thomas and Maaren receive forwarded mail from cochice
- Investigate compatibility between MySpell and Aspell source files
- Debian has common source files for these.
- Debian has common source files for these.
- Finish giellatekno forrest transition.
- Done, deployed.
- Done, deployed.
- Other tasks:
- Has looked at sfst, begun reading "THE book" (about the Xerox tools)
Maaren
- The missing list, both the overall missing list from our xml corpus,
- Done?
- Done?
- Go through the missing list from risten.no
- Partially done
- Partially done
- remember headphones, 2*money and batteries for Helsinki
- Done
Sjur
- risten.no bugs and fixes
- Nothing during the last two weeks
- Nothing during the last two weeks
- complete the action summary after our half-year evaluation
- follow up on:
- voice group-chat not working to Sámediggi
- This won't work without a new or upgraded Firewall
- This won't work without a new or upgraded Firewall
- Maaren has problems with SubEthaEdit (can't connect)
- Maaren's computer updated and fixed, the problem might be solved,
- Maaren's computer updated and fixed, the problem might be solved,
- voice group-chat not working to Sámediggi
- To the board:
- write proposal for permanent maintenance organisation
- write draft specification for the outsourced tasks
- write half-yearly project report with progress and bugdet status
- write agenda
- Deadline for the board tasks: 3 weeks ahead of meeting (meeting is
- write proposal for permanent maintenance organisation
- add Divvun to the UNESCO Hall-of-Fame list
- Not done
- Not done
- Plan the Helsinki meeting
- Done
- Done
- meeting about corpus contract
- Done
- Done
- project planning with Trond
- Not done
Thomas
- work on Lule Sami compounding and derivation, check closed POSes
- checked closed POSes, started with derivation
Tomi
- Aspell: Continue working on the affix file
- Work has been done there
- Work has been done there
- investigate Aspell/MySpell/OOo issues
- MySpell uses (almost) the same kind of wordlist & affix files as aspell
- OOo is turning over to hunspell, which has a much more powerfull and
- MySpell uses (almost) the same kind of wordlist & affix files as aspell
- three-part compounding
- Not done
- Not done
- decapitalisation of proper nouns when (if?) compounded, and when derived
- This is working, but should be added to makefile and CVS
- This is working, but should be added to makefile and CVS
- corpus infrastructure: dtd location (both public and internal)
- Discussed in Helsinki
- Discussed in Helsinki
- corpus infrastructure: file and dir organisation
- Discussed in Helsinki
Trond
- Work on the bug list (Lule Sámi).
- No work on Lule Sámi last week)
- No work on Lule Sámi last week)
- Work on compounds (three-part + oslolaš, with Tomi)
- The oslolaš issue was solved in the Helsinki conference
- Three-part compounds still open
- The oslolaš issue was solved in the Helsinki conference
- Work on the corpus interface (with Lars)
- Not done.
- Not done.
- Corpus infrastructure: dtd location
- Made a principled decision in Helsinki, not implemented.
- Made a principled decision in Helsinki, not implemented.
- Get the new version of the New Testament
- Got an html version in Helsinki from a collegue(!), still haven't got
- Got an html version in Helsinki from a collegue(!), still haven't got
- Check Hans-Ragnars names.
- Not done.
- Not done.
- Look at the Sámi names issue
- Not done. But the CD is on my desk, so we will look into it here in Tromsø.
- Not done. But the CD is on my desk, so we will look into it here in Tromsø.
- Plan the Helsinki meeting
- Done
- Done
- New coworker
- Formalities start now.
- Formalities start now.
- meeting about corpus contract
- We had a meeting, the work continues.
- We had a meeting, the work continues.
- check the new giellatekno site
- Some checking done.
- Some checking done.
- project planning with Sjur
- Not done.
3. Summing up last week (Meeting + conference)
- Meetings: Our meetings went well.
- Demo: We should have had a prepared text with collected authenthical spelling
- The conference was too advanced from time to time (especially the mathematical
- We met the community and discussed, that was very positive. E.g. discussion
4. Documentation
The giellatekno.uit.no site is deployed. Børre has read and fixed some of the
The standing invitation remains: Document what you do, when you do it.
Especially document Aspell. The Makefile, how to build, what changes has been
5. Corpus gathering
Børre, Trond and Sjur had their meeting, and the Helsinki contract is quite good
Paths forward: We have a contract suggestion. Sjur and Børre should start the
How to proceed:
- Get the contract suggestion ready
- Translated part 1 ok, part 2 and 3 missing. Done this week
- Get part 4 from Kimmo, and translate it
- Contact our lawyers, at SD and UIT (today, tomorrow).
- When the Norwegian version of the contracts are ready, make
- Translated part 1 ok, part 2 and 3 missing. Done this week
- Approach the text owners (see ordered list below)
Independent of the contract work
- Bible: The new testament (Trond)
- Bureaucratic text:
- Sámi Parliament (Børre)
- Sámi Oahpahusráđđi (Børre)
- KRD (Børre, check whether we miss texts (discuss with Trond))
- the Sámi municipalities (Børre)
- Sámi Parliament (Børre)
- Textbooks
- To the extent that text can be got directly from SO.
After the contracts are ready
Sjur and Børre should probably take a Tour-de-Sápmi, and meet with the
The tour should be planned, not in this meeting, but before the contracts
- Commercially published texts
- Author organisations' meetings
- Key authors one by one
- (list of author names) Kerttu Vuolab, Kirsi Paltto,
- (list of author names) Kerttu Vuolab, Kirsi Paltto,
- Iđut and key authors there (Børre)
- Davvi Girji and key authors there
- Author organisations' meetings
- Newspaper text:
- Sámi Instituhtta's (for the old archive of Min Áigi and Áššu)
- Áššu has been making a CD since the end of may, there should be a pile
- Min Áigi
- Sámi Instituhtta's (for the old archive of Min Áigi and Áššu)
List of texts with lower priority (to be gathered when the above list is
- the Sámi municipalities,
- Authors with smaller production
- Textbooks
6. Corpus infrastructure
Do documentation.
Naming conventions and directory structure
We have a decision from Helsinki:
- have the same directory structure in all three levels, and we also decided
- Path forward: Tomi and Trond to implement the directory structure
7. Linguistics
North Sámi
- New place names received, should be added to our lexicons
- three-part compounds issue still open
- Johnny Andersen has written a letter to us on the treatment of Sámi place
Lule Sámi
We do not know when we get the lexicon from Anders Kintel. We need a meeting
Status quo on our parser:
- All the major POSes have been covered
- closed POSes have been checked
- compounding and derivation
- Thomas has begun with the deverbals
8. Speller infrastructure
aSpell
Write documentation here as well.
Munch-list is working, and the affix file is improving. See previous meeting memo.
Issues:
- The phonetic file should be systematically looked into.
- Check that it works
- Add more correspondences on an impressionistic basis
- Check that it works
- Start work on collecting systematic spelling errors:
- Our in-house file typos.txt
- The soon-to-arrive error texts from newspapers
- Our in-house file typos.txt
- The holes in the affix list should be mended
- We should, at some point, evaluate whether this is The Correct Approach to
- Affix file UTF-8 problem should be checked and reported.
- Then there is the UTF-8 root (or whatever) problem: The work-around using
- The clitics issue: Today we have a manually created affix file in order to
- We must create subcomponents under the Speller
MS Office spellers
Nothing new, see
OpenOffice.org
From last meeting:
The conversion from aspell to myspell will work trivially as soon as the myspell
Issue left open.
Hunspell
Hunspell is presently already working with OOo, and is a much better speller
Issue left open.
Other engines
Børre and Sjur had a long discussion with the author of the SFST library/tool
9. Other
Technical issues
- Fixing the machine for the new coworker
- The mac os / perl bug (at least Trond and Sjur has it):
- utf8 "\xC4" does not map to Unicode at /Users/trond/gt/script/preprocess line 82.
- Sjur has a non-solved Backspace + UTF-8 issue
- utf8 "\xC4" does not map to Unicode at /Users/trond/gt/script/preprocess line 82.
- 29 open bugs - too much! Have a look at what you can fix.
10. Summary, task list
Børre
- Finish crontab specification for the cvs update/export script Tomi made
- reopen the jspwiki + UTF-8 issue
- Add issue to forrest issue tracker about utf-8 ihtml documents.
- Contact Svenska bibelsällskapet
- discuss with Anders Kintel about possible cooperation
- Follow up on CVS mailing:
- set up Maaren
- set up Maaren
- Meet up with Trond about directory structure
- Contact oahpahusossodat and the rest of the SD about texts
- Fixing the machine for the new coworker
Maaren
- The missing list, both the overall missing list from our xml corpus, and a
- Go through the missing list from risten.no
- Start working on grammatical issues with Thomas and Trond.
Saara
- Get aquainted with the project status quo
- Look at the corpus infrastructure issue
- Look at the corpus interface issue with Lars
Sjur
- risten.no bugs and fixes
- complete the action summary after our half-year evaluation
- follow up on:
- voice group-chat not working to Sámediggi
- Maaren has problems with SubEthaEdit (can't connect)
- voice group-chat not working to Sámediggi
- To the board:
- write proposal for permanent maintenance organisation
- write draft specification for the outsourced tasks
- write half-yearly project report with progress and bugdet status
- write agenda
- Deadline for the board tasks: 3 weeks ahead of the meeting (the meeting is
- write proposal for permanent maintenance organisation
- project planning with Trond
Thomas
- work on Lule Sami compounding and derivation
- Look at Linguistic bugs with Trond.
- Work on the name agreement with "Norge digitalt" with Trond
Tomi
- Aspell: Continue working on the affix file
- three-part compounding
- Add downcasing to makefile and CVS
- corpus infrastructure: dtd location (both public and internal)
- corpus infrastructure: file and dir organisation
Trond
- Work on the bug list (Lule Sámi).
- Work on compounds (three-part, with Tomi)
- Work on the corpus interface (with Lars)
- Corpus infrastructure: dtd location
- Work on the name agreement with "Norge digitalt" with Thomas
- Look at the linguistic aspects of the speller clitics, with
- Get the new version of the New Testament
- Check Hans-Ragnars names.
- New coworker
- translate contract
- check the new giellatekno site
- project planning with Sjur
10. Next meeting, closing
12.09.2005 10: 00
Closed at 12: 56