Divvun/Disamb Seminar in Tromsø

  • project meeting
    • date: 23. (after lunch) - 26. (Friday is travelling day) January.
    • place: Tromsø
  • Participants
    • Monday, Tuesday, Wednesday: Børre, Ilona, Linda, Saara, Sjur, Tomi, Trond
    • Thursday: Børre, Ilona, Linda, Sjur, Tomi, Trond
  • Practical arrangements:
    • Room(s): The meeting is at room 4475, with room 4402 as alternating room. Both rooms are situated in the same floor as our offices.
      • We have one projector, and perhaps a possibility to hire another one.
      • 4475 has internet, but no whiteboard, and 4402 has whitebord, but not internet.
      • TODO: Get hold of a flipover for 4475.
      • TODO: Lunch only for 7 persons
      • TODO: Machine for Ilona at 10.00.
    • Lunch will be in Sjampanjekantina, 11.30, Monday to Thursday.
    • Coffee breaks are not arranged, we may order them from the canteen, or we may buy it ourselves.
    • hotel = Thon Polar Hotel, Grønnegata 45 (Saara, Ilona, Linda, Sjur)

Suggested content that's still not scheduled:

  • Programmers
    • Work distribution (in bug fixing, documentation, maintenance etc.)
      • Not done
    • Parallel corpora and the corpus infrastructure
      • Discussions and clarifications, planned a trip to Oslo, no concrete work
    • xml2lexc implementation plan
      • Not done (much done earlier, though)
    • file-specific xsl scripts for corpus conversions, the what, how and who issues (t: Saara/Tomi; p: Trond, Børre, Sjur)
      • Saara has demoed her work, with eager support of Tomi and Børre, we have thus got some insight in what to do and how. Our xsl scripts cannot do much heavy work, though, they remain on the metalevel (modifying the header rather than the body).
  • Linguists
    • Two linguists were missing, so not too much linguistics was discussed, but some.
    • Get together and discuss common issues
      • Done
    • Approaches to working with the lexicon
      • We are wiser now. (or are we?)
    • Tags
      • All too short discussion with the Saami dept, but at least it clarified our own thoughts. They promised to look at the tags and their usefulness. The indefinite pronoun question remains open.
    • Substandard forms
      • Not done.
    • The numeral projects
      • Progress made:
      • For the arabic part, num.fst is now integrated into sme.fst. It needs refinement, though.
      • For the linguistic part, we did some fieldwork with an innocent informant (Tomi).
    • Twol issues
      • Not touched.
    • Unix etc. courses.
      • Kept a 45 min unix course, useful. Still more hints to pick up, though, but ok that the hints are presented in palatable portions. Conclusion: Have a unix session at the next meeting as well, or perhaps unix course sessions online, next week etc.
  • All
    • how to correctly write Forrest documentation
      • site.xml (what is it, how do we add to it?)
      • site-frag.xml (what's the difference?)
    • testing - how-to, improvements
    • code style guidelines



  • a mixture of training/discussions and practical work, to avoid monotonicity
  • most training in the beginning, as preparations for later work
    • The schedule never recovered from the changing participant number (-2, +1, -1, etc.), but nevertheless we had a good workmeeting.

Monday 11.30-16

The meeting starts with lunch in Sjampanjekantina. 11.30.

Common 12-13.30 (1,5 h)

  • Presentation, kick-off 15 min
  • Project updates 15 min (2*7 min)
  • Project milestones 15 min (2*7 min)
  • Cooperation evaluation (practical/daily cooperation) 30 min
    • project management
    • programmer work
    • linguistic work
    • anything else?
  • break 15 min

13.30-14.45 (split in two groups)

  • Trond, Ilona, Linda : Unix
  • Sjur, Tomi, Børre, Saara: XQuery intro

15-16 (all)

Software updates/maintenance:

  • XXE + config updates
    • Done
  • OS X (all on 10.4.4)
    • Done
  • bash / localisation status quo
    • Many different setups did that we never reached a conclusion. Sjur has a working setup …
  • Xerox tools: We must negotiate a new version with Xerox.
    • We have full version for creating the full wordlists, but only on cochise
    • the new Mac is faster and has more memory
  • forrest? Probably not
  • SEE, Unison, CVL, Marratech
    • Done
  • CodeTek VirtualDesktop setup/config (?)
    • Done

Tuesday 8.30-11.30

  • How to use the xml corpora (Tomi, Børre, Saara => all - 1h)
    • Done, Saara and Børre had a good introduction.

(Computational) linguistics 9.40-11.30

  • keepinmind
    • 3-part compound treatment
  • Numeral session

We split in 3 different groups

  1. Techni group: Børre, Sjur (updates)
  2. Arabic group: Saara, Linda
  3. Alphab group: Tomi, Trond, Ilona
  • Arabic
    • regx: regex for numeric expressions, dates, quantifiers, ... 12. 3. 2003
  • Alphabetic
    • ling: what are the linguistic facts?
    • comp: tags, lexc treatment
    • regx: regex for numeric expressions, dates, quantifiers, ... 12. 3. 2003
      • Much done, but implementation is still half (arabic) and full (linguistic) open. Saara said she would have a look at dates with words (months).

Tuesday 12-16

  • Programmers: Sjur, Børre, Tomi, Saara, (Irene)
    • XML specification
      • Irene <=== Check whether and how we cover all of the kvensk needs, done Discuss the multimedia things, done
  • Linguists: Ilona, Linda, Trond.
    • Note: We should consider including someone from the Saami department.
      • Done. (Kjell Kemi and Steinar Nilsen)
    • Tagsets and POS in Saami (Evaluating and fixing (?) the tagset both for the parser and for the disambiguator)
      • We only presented our problems and our analyser to the Saami dept staff.
    • SGL decisions:
      • overview, any consequences for our work?
      • No input from SGL.

Wednesday 8.30-11.30

  • Programmers: Sjur, Børre, Tomi, Saara
    • XML specification
      • Irene <=== Check whether and how we cover all of the kvensk needs Discuss the multimedia things

XQuery/eXist/proper names continued

Wednesday 12-16

  • Disamb
    • Reviewing issues from our first meeting:
    • Schedules, internal dependencies


Sjur and Børre in meeting - all day?

Thursday 8.30-11.30

Thursday 12-16

To be allocated

Groups: ling:

  • Ilona, Linda tech:
  • Saara, Tomi, Sjur, Børre, Trond twol:
  • Sjur, Trond


Routines for addition to lexicon, cooperation wrt. new corpus texts coming in

  • Maaren, Ilona, Børre, Trond, (Linda, Thomas)
    • Ilona and Linda have discussed cooperation here.

eXist/XQuery/XSL/webapp/ names tasks (17,5h/2d2,5h):

  • XQuery people: primary: Saara, Tomi, Sjur (=STS); optional: Børre, Trond (=BT)
  • XQuery/XSLT intro (1,5h) (STS + BT)
  • intro to the architecture (1,5h) (STS + BT)
    • server setup
    • webapp organisation:
      • frames
      • search vs edit
    • how the search works presently
    • how the editing works presently
      • Done.
  • single down the simultaneous update bug (3h)
    • DONE (puh) and reported to the mailing list (read and enjoy)
  • enhance security (1,5h) (https? Yes, but how to? password check) (STSB)
  • refactor the code to split collection-specific code from general (both edit and search) (3h+)
    • Work began
  • make a first version of proper noun infra (6h)
    • search
    • edit
      • Not done
  • synchronisation with CVS (1,5h):
    • download files
    • sorting (identical and fixed sort order)
      • Not done.
  • setup access to the windows server (0,5h)
    • Not done
  • refactor to stored procedures (everything contained within eXist)? (discuss, 0,5h)
    • will make it easier to upgrade the webapp (no need to log on to the server machine, just use the eXist Java client)
    • will make the webapp mostly selfcontained
    • might introduce a performance hit - needs to be checked
      • Not done
  • XML DTD check:
    • review the converted lexicons
    • anything missing, or badly organized?
      • Not done


We have seen important progress in some but very limited fields, but there are still open issues.

Evaluated against our expectations the week was too short.

One lesson to learn is that we should have met better prepared, we should have gone through the issues in beforehand, so that we could concentrate upon the really hard problems.

It was nice to meet face to face, both for the first time, and for the long-time-no-see factor.