Meeting_2005-02-28
Meeting setup
- Date: 28.02.2005
- Time: 09.30 Norw. time
- Place: Wherever we are: -)
- Tools: Phone, iChat, SubEthaEdit
Agenda
- Opening, agenda review
- Reviewing the task list from a week ago (see below)
- Documentation:
- are we ready to remove the old HTML docs?
- is the Wiki format for meeting memos ok?
That is, can it be used during the meeting?
- are we ready to remove the old HTML docs?
- Corpus gathering
- Corpus infrastructure
- Linguistics
- Language technology:
- compilation problems on cochise
- compilation problems on cochise
- Term db
- Other issues:
- Pargas seminar: who is going?
- Pargas seminar: who is going?
- Summary, task lists
- Closing
Last week's task list:
-
Børre: suggest port numbers in the documentations
- Continue .profile discussion in news, conclusion this week
-
Børre: mpage & UTF-8
-
Sjur: Post a description on how to patch XXE for Alt key support
-
Børre: contact Anne-Britt today, and send the letter.
-
Børre and Trond: update emacs and xml-mode, check automatic version-stamp
in cvs $id$? -
Sjur: Send e-mail to Trond regarding binary files on the news group
-
Børre: wiki support in forrest
-
Trond: invite Lars N
-
Børre + Trond: set up computers, check forrest in cochise
-
Børre: divvun.no
- Tomi/Børre: do some work for Sjur on the termdb
1. Opening, agenda review
2. Reviewing the task list from a week ago (see below)
-
Børre: suggest port numbers in the documentations:
in the documentation now (under infra/forrest), will set up a separate main tab How-To - Continue .profile discussion in news, conclusion this week
we move from .profile to .bash_profile, docu to be updated -
Børre: mpage & UTF-8
nothing happened last week - file a bug in Bugzilla, with the solution postponed to Tiger (OS 10.4) -
Sjur: Post a description on how to patch XXE for Alt key support
posted a description in news, ok -
Børre: contact Anne-Britt today, and send the letter.
Berit Karen has asked Børre to check the address list, letter was sent last week -
Børre and Trond: update emacs and xml-mode, check automatic version-stamp
in cvs $id$? it seems to work for Sjur, more info needed -
Sjur: Send e-mail to Trond regarding binary files on the news group
e-mail sent, Trond forwarded it, but the issue is still open. Trond will follow up the issue. -
Børre: wiki support in forrest
done, except the UTF-8 problem -
Trond: invite Lars N
Trond has sent e-mail with invitation, Lars has answered and will participate. -
Børre + Trond: set up computers, check forrest in cochise
issue is still open, Trond has problems from home and other locations -
Børre: divvun.no
IT staff were in Jokkmokk last week, nothing has happened -
Tomi/Børre: do some work for Sjur on the termdb
Tomi was home with fever last week, Børre has looked at it
3. Documentation:
- Are we ready to remove the old HTML docs?
- Not until forrest works on cochise, no.
- Marit is fine with emacs in cochise, Børre will add info about XML
mode in emacs to the docs, and also install in cochise
- Not until forrest works on cochise, no.
- is the Wiki format for meeting memos ok?
That is, can it be used during the meeting? Yes.
4. Corpus gathering
Thomas: Mikael Svonni has promised to give us his lexicon/dictionary. Asbjörg Skåden
Thomas and Trond have discussed a letter to the Swedish ministry of agriculture,
5. Corpus infrastructure
- dir/file structure
- some of the meta information
- processing of incoming texts
- For .doc input: antiword via DocBook to xml is a strong candidate
- For .pdf we have no good solution yet
- For .html input: Not looked at yet, but html > xml should be easy.
- For .txt: Here we are interested in using Lars Nygård's txt structure guesser.
- The encoding issue: We have identified between 5 and 10 different formats,
and need more work on the conversion scripts (Børre has started working on such scripts for antiword). - Other formats: we don't know which yet
- For .doc input: antiword via DocBook to xml is a strong candidate
- thus, preferred formats are so far (in descending order): .doc, *.html, *.txt, *.pdf
- target format (suggestions in newsgroup, this discussion is still open)
6. Linguistics
String categories still to cover:
- html addresses, number formats, number-letter combinations "*CG2" as opposed
to "SGML" "CG" (all uppercase is now covered for max 5 char strings, thus "SGML", but not "sgml") - These strings can be covered by regular expressions in lexc number generator,
acronym generator. - For HTML addresses, something like:
http://[a-z]+.[a-z0-9]+.{com,org,etc.}
- Trond files bug reports.
7. Language technology
- lexicon compiles fine on the Mac, but not in cochise
- Trond has written a letter to Ken and Lauri (the authors of the Xerox tools),
and will file it as an internal bug report as well.
8. Term db
9. Other issues:
- who is going?
- Sjur and Børre go, one more (Tomi or Thomas)
- Sjur and Børre go, one more (Tomi or Thomas)
- topics of the seminar:
- User, developer and normative body perspective.
10. Summary, task lists
-
Sjur, Børre, Tomi: terminology database
-
Børre: will set up a separate main tab How-To
-
All: we move from .profile to .bash_profile, Børre: docu to be updated
-
Børre: mpage & UTF-8 - file a bug in Bugzilla, with the solution postponed
to Tiger (OS 10.4) -
Børre:
- update emacs and xml-mode: Børre has a solution, will document
- check automatic version-stamp in cvs $id$? As above.
- update emacs and xml-mode: Børre has a solution, will document
-
Trond: will follow up the issue with binary postings in news
-
Børre: wiki support in forrest works, will follow up on the UTF-8 problem
-
Børre + Trond: set up computers, check forrest in cochise
issue is still open, Trond has problems from home and other locations -
Børre: divvun.no -
to be continued now that the IT staff is back from Jokkmokk
11. Closing
Closed at 10.57

