tromso-2006-08
August 2006 gathering
- Place: Tromsø
- Dates: 22-24?
Participants
Topics
- Presentation, status quo and plans for divvun and disamb
- Linguistic issues
- Normativity
- Lexical coverage
- 3-part compounds
- Normativity
- Lang-tech issues:
- finalise Xerox hyphenation (Trond + Sjur 2-3h)
- G3 issue for sme
- finalise Xerox hyphenation (Trond + Sjur 2-3h)
- Technical issues
- The munching deadlock
- M4 macros for hyph/spell/dis versions of TWOL
- our own wiki? for (technical) CG documentation,
- The munching deadlock
-
Speller plans
- Polderland work
- Online meeting with them? Yes. Memo
- Online meeting with them? Yes. Memo
-
Proper noun status and action plan
- action plan:
- technical evaluation: how do we make proper noun editing work in Emacs?
- how much more
- action plan:
Commented topics
- linguistic issues
- General linguistic discussion (morphology (and syntax?)):
- What is on top of the priority list?
- What are our most serious problems?
- What can we learn from each other?
- How should we work together across the projects in the remaining months?
- What is on top of the priority list?
- Lexical coverage
- 3-part compounds
- General linguistic discussion (morphology (and syntax?)):
- lang-tech issues:
- finalise Xerox hyphenation (Trond + Sjur 2-3 timar)
- G3 issue for sme
- finalise Xerox hyphenation (Trond + Sjur 2-3 timar)
- technical issues
- The munching deadlock
- M4 macros for hyph/spell/dis versions of TWOL
- our own wiki? for (technical) CG documentation, (open source) fst technology and Saami language technology (basically the parts of our current documentation that isn't project-specific. Goal: to invite and engage a larger community (CG, fst/transducer users, i.e., the grammatically-based bottum-up parsing community) to improve and create documenation for the above topics. Potential partners are Helsinki, Oslo, Odense, Stuttgart.
- The munching deadlock
-
speller plans
- Polderland work
- online meeting with them? Yes, memo available
-
proper noun status and action plan
- action plan: we need to systematically go through the most typical/useful tasks (problem now: Sjur and Tomi do not edit lexicons enough to really know what is most important)
- technical evaluation: how do we make proper noun editing work in Emacs, and who? Are the coding (as in XML tags/structure) solutions good? (speed is not an issue, at least not on my new Mac: -), and shouldn't be on the server either)
- action plan: we need to systematically go through the most typical/useful tasks (problem now: Sjur and Tomi do not edit lexicons enough to really know what is most important)
Worst-case scenario: The name project failed, took too long time, etc., and we will have to build separate name lexica in traditional lexc format for each language. In the best case we will get it up and running in a reasonable amount of time.
- corpus collection:
- how much more (in terms of effort and invested time)? Børre needs to start looking at testing pretty soon - we should have tools for people to test this fall, and we need a working feedback infra to make sure we get the respons we need
- how much more (in terms of effort and invested time)? Børre needs to start looking at testing pretty soon - we should have tools for people to test this fall, and we need a working feedback infra to make sure we get the respons we need
- divvun plans ahead
Time table
Time | Tuesday | Wednesday | Thursday |
---|---|---|---|
8: 30-10: 00 | A Presentation (9: 00->) | T lexc2Xspell | Machine update + planning |
L Consequences of eval | Aligner (Trond, Saara) | ||
10: 00-10: 30 | A Reports, plans | Coffee | A Coffee |
10: 30-12: 00 | a Polderland | T lexc2Xspell | - |
a Polderland | L Consequences of eval | - | |
12: 00-13: 00 | A Lunch | A Lunch | A Lunch |
13: 00-14: 00 | A G3/Howto/m4/Wiki | A Name lexicon 1 | (exit Saara, Maaren) |
13: 45: Coffee | (what shall we store) | - | |
14: 00-14: 30 | T Video with PL (1h) | A Coffee | A Coffee |
L Evaluate | - | ||
14: 30-16: 00 | A *G3/Howto/m4/Wiki | T Name lexicon 2 (how) | - |
- | - | L (3part) | - |
... | preprocess --abbr=bin/abbr.txt | corrtypos.pl | .. ... | preprocess --abbr=bin/abbr.txt --corr=src/typos.txt | lookup ...
A = all a = all - Lene T = Saara, Sjur, Trond, Tomi, Børre L = Lene, Maaren, Thomas
Tuesday afternoon 1
A Presentation with Polderland T Video with PL (Saara, Sjur, Trond, Tomi, Børre, Maaren, Thomas) L Evaluate our linguistic analysers (Lene, Maaren, Thomas) How good are the tools? (M, T explaining L what input she can expect from M, T) What does it take to make them better? Do we need tools for measurement Or: Office as usual, working
Tuesday afternoon 2
- G3: Trond, Sjur, Thomas
- Howto: Maaren explains practical linguistic work to Lene <= M&L find out how to cooperate...
- m4: Saara, Tomi
- Wiki: Børre (todo: write a spec)
Thursday morning
- Plenary machine update (all(?)) <==
- Aligner (Trond, Saara)
Machine updates:
- forrest issues / installations / updates
- readline / bash
Thursday evening
- G3 (Sjur, Trond, Thomas) <==
- Hyphenation (Sjur, Trond) <==
- Bible format discussion
- Corpus health care
- Saami chars in pdf
- Solved! Almost: -) - path needs to be generalised, then checked in to CVS.
- Solved! Almost: -) - path needs to be generalised, then checked in to CVS.
- i18n finalisation: language selection menu (using dispatcher)
rule for id field in common.xml default: you are your own id overriding: your id is different from yourself, and points to another lemma common.xml Kautokeino .. nob, nno, eng ... id=Guovdageaidnu Guovdageaidnu ... sme, sma, smj ... id=Guovdageaidnu sme.xml (id info is inherited, perhaps) Kautokeino Guovdageaidnu | `´ sme.lexc (pure lexc file, generated) Kautokeino contlex-i ; Guovdageaidnu contlex-j ;
M4 flags:
HYPH:
- exclude or include the hyphenation marks when making a hyphenator transducer ("#", "-" and "^")
SHORTCOMP:
- Permitting truncated two- and three-part compounds or not
DIPHSIMPL:
- Exclude/include twol ruleset to allow diphthong simplification.
norm: oahpaheddjiid also: oahpaheaddjiid and oahpaheddjiid ==> include/exclude the G3-sensitivity of the diphthong simplification rules
non-m4-variation (variation in the lexicon):
- Circular transducer or not? Today:
grep -v '^[CN]^'
- Including SUB entries of not?
- EAST, WEST, ALLDIALECTS Differenting between eastern and western dialectal forms, tolerating both
- SOUTH Tolerating cleary non-standard (like Locative Sg -n)