Meeting_2005-08-08
Meeting setup
- Date: 08.08.2005
- Time: 10.00 Norw. time
- Place: Wherever we are : -)
- Tools: Phone, iChat, SubEthaEdit
Agenda
- Opening, agenda review
- Reviewing the task list from a week ago
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Speller infrastructure
- Other issues
- Summary, task lists
- Closing
1. Opening, agenda review, participants
Opened at 10: 08.
Present: Tomi, Sjur, Børre
Absent: Maaren, Thomas, Trond
Main secretary: Børre
2. Reviewing the task list from the last meeting
Tomi
- three-part compounding
- Nothing done
- Nothing done
- decapitalisation of proper nouns when (if?) compounded, and when derived
- Got a tip from Trond about a Xerox example, but did not have a look at
- Got a tip from Trond about a Xerox example, but did not have a look at
- corpus infrastructure: dtd location (both public and internal)
- Not done
- Not done
- corpus infrastructure: file and dir organisation
- Nothing done
- Nothing done
- Aspell: A simple wordlist based speller
- Working!!
- ... but it's rather big:
- word list (plain text) is 587 Mb, and after munch-list option it is 433 Mb
- word list (plain text) is 587 Mb, and after munch-list option it is 433 Mb
- It is rather slow, because it has no affix-file, and the wordlist is huge.
- Tomi will try to make an affix list, based on our lexicon files
- New aspell in cochise!!
- You can use it temporarily from /home/tomi/bin/aspell and
- The new aspell should be installed in a regular location, replacing the old
- You can use it temporarily from /home/tomi/bin/aspell and
- Working!!
- Follow up on CVS mailing:
- check that all received forwarded mail
- Børre and Sjur is receiving mail as intended now
- Børre is suggesting that a cvs diff is attached to the commit mails, with
- Børre and Sjur is receiving mail as intended now
- set up Thomas and Maaren when they're back from vacation
- They're still on vacation
- check that all received forwarded mail
On vacation last week
Trond
- On vacation, issues from last week repeated at the end of the memo
Maaren
- On vacation, issues from last week repeated at the end of the memo
Thomas
- On vacation, issues from last week repeated at the end of the memo
Sjur
- On vacation, issues from last week repeated at the end of the memo
Børre
- On vacation, issues from last week repeated at the end of the memo
3. Documentation - divvun.no
Forrest war crashes Tomcat
We have the same problem with risten.no. This blocks the addition of new
Steps to fix this bug (add):
- Take a backup of the present war file (it can be found within the Tomcat webapps/ dir)
- Make sure TWICE that you have that backup in a safe place
- Test both 5.0.29 and 5.5.9 versions of Tomcat, to see if there is any difference
- try to identify when the bug was introduced (somewhere around the middle of June)
- Report the findings to the Forrest user list
Automated update script
Set up the cron job on cochise. Permissions are set correct now, divvun.no can be
4. Corpus gathering
The correct directory for corpus legal things is adm/legal/. The Oslo example should
Then we'll have to write our own contract, adapted to our own needs, and get it rephrased
THEN we can start to gather texts: -)
5. Corpus infrastructure
Do documentation.
6. Speller infrastructure
aSpell
Is working, see above. Needs further optimisation before it is useful, cf the size
Write documentation here as well.
MS Office spellers
To be specified in the outsourcing tech-spec. What
OpenOffice.org
Questions:
1) Is it possible to use Aspell with OOo?
No. From the site:
Background Lingucomponent was started by Kevin Hendricks to integrate various open source spell checkers into the OpenOffice.org build. With a little prodding from Kevin Atkinson, the author of Pspell, a new spell checker called MySpell was written in C++ that supported affix compression, based on Ispell. This code has been given to Pspell and will eventually make it into a future Pspell release after it has been integrated. Later note: Pspell is dead and Aspell is its replacement. I think Aspell is found at aspell.sourceforge.net. There is no OOo interface to Aspell (no way to use Aspell with OOo). kh. MySpell, which builds on almost any system, is used to support spell checking in OpenOffice.org.
Links to possibly useful stuff:
2) Is the Aspell source files compatible with MySpell?
To be investigated. Goal: to simplify
7. Other
8. Summary, task list
Børre
- Add crontab specification for the cvs update/export script Tomi made
- investigate Forrest war + Tomcat issue
- reopen the jspwiki + UTF-8 issue
- Contact Svenska bibelsällskapet
- Add issue to forrest issue tracker about utf-8 ihtml documents.
- Check for existing bug reports, it might already be in there
- Check for existing bug reports, it might already be in there
- Continue forrest i18n research
- Sometimes abides the =locale=se/smj/fi in the url (the document gets
- A lot is up for i18n within Forrest. Suggestion: we wait and see a while.
- A lot is up for i18n within Forrest. Suggestion: we wait and see a while.
- Sometimes abides the =locale=se/smj/fi in the url (the document gets
- discuss with Anders Kintel about whether we could get the Sámi-Norwegian dictionary
- wait till Thomas is back from vacation
- wait till Thomas is back from vacation
- add the Oslo contract to cvs, and begin writing a contract draft
- Follow up on CVS mailing:
- check that all received forwarded mail
- set up Thomas and Maaren when they're back from vacation
- check that all received forwarded mail
Sjur
- risten.no bugs and fixes
- complete the action summary after our half-year evaluation
- follow up on:
- voice group-chat not working to Sámediggi
- Maaren has problems with SubEthaEdit (can't connect)
- voice group-chat not working to Sámediggi
- To the board:
- write proposal for permanent maintenance organisation
- write draft specification for the outsourced tasks
- write half-yearly project report with progress and bugdet status
- write agenda
- Deadline for the board tasks: 3 weeks ahead of meeting (date not decided, but
- write proposal for permanent maintenance organisation
- add Divvun to the UNESCO Hall-of-Fame list
Tomi
- Aspell: Create the affix file, work on automatic conversion from LEXC to affix file
- investigate Aspell/MySpell/OOo issues
- three-part compounding
- decapitalisation of proper nouns when (if?) compounded, and when derived
- corpus infrastructure: dtd location (both public and internal)
- corpus infrastructure: file and dir organisation
- Follow up on CVS mailing:
- check that all received forwarded mail
- set up Thomas and Maaren when they're back from vacation
- check that all received forwarded mail
On vacation
Tasks are transferred to the following weeks until they return from vacation, to
Maaren
- The missing list, both the overall missing list from our xml corpus, and a file-for-file
- Go through the missing list from risten.no
- send new letter to SGL about Lule Sámi possessives
Thomas
- write new letter to SGL about Lule Sámi possessives
- work with Lule Sami substantives
Trond
- Work on the bug list (Lule Sámi).
- Work on compounds (three-part + oslolaš, with Tomi)
- Work on the corpus interface (with Lars)
- Corpus infrastructure: dtd location
- Get the new version of the New Testament
- Check Hans-Ragnars names.
8. Next meeting, closing
15.08.2005 10: 00
Closed at 12: 08