Meeting_2010-10-18
Meeting setup
- Date: 18.10.2010
- Time: 09.30 Norw. time
- Place: Internet
- Tools: SubEthaEdit, iChat
Agenda
- Sma issues
- Forthcoming finsme dictionary
- Info topics
Cf. one of the following, depending on context:
- the upper bar of the SEE window (provided you use the JSPWiki syntax mode)
- the TOC in Forrest-rendered output, like HTML and PDF
Opening, agenda review, participants
- Opened at 10: 02.
- Present: Børre, Ciprian, Lene, Maja, Sjur, Tomi, Trond, Biret Ánne
- Absent: Thomas
Sma issues
Need to correct dictionary against speller
Wrong dict forms in speller? There are misspellings in the smanob dictionary,
The fst and the dictionary show different Sg3 forms (where the Sg3 forms (marked
The XXX marks can mean:
- there is a difference between analyser and p3p or wordclass info (comes often from smaswe)
- there are several forms given by the analyser, one can correspond with
Status quo for narmativity
The letter on loan words was sent to SGM on friday. We also sent a letter
How shall Divvun and Gt cooperate on the sma work during what is left of this year?
Lene has a list of dictionary verbs not in the analyser. During this work some
- missing verbs (and other words)
- misspellings and errors in both the transducer and smanob (cf above)
Way of work:
- Read through smanob and look for errors (adj, noun)
- Check whether these errors are in the normative analyser
- mark problematic words as non-includes
- reverse sorting the fst lexicon, look at continuation lexicon
- unify lemma forms for all fst entries (ie multiple entries for the same word
- sort and unique fst entries - double forms in sma lexicon shall be unified
- learn typical error patterns from the dictionary correction work
The three i-s of sma - is this true?
- i as i
- ï as i and ï (klinïgke / klinigke)
- ï shall not be i (but is made so via spellrelax)
Before we implement it we shall find out whether the second category
TODO:
- sort and unique nav entries in sma (Trond)
- discuss the possible third i in sma (Maja, Thomas)
- go through the XXX in the dictionary verbs
- sma linguist meeting - date & time to be decided
Forthcoming finsme dictionary
We do not quite know when it comes, but what shall we do with it when it comes?
Hvis penger - Johanna Ijäs kan gjøre noe med dette?
Info topics
a new programmer in the sma-oahpa project
Ryan for 2 months.
Cips journey to Iceland
Here is my very minimalistic notes on tools:
- corpus_search tool (try it!) -
- DATR (vs. XFST try it!) -
- GLOSSA (Anders Nøklestad - GLOSSA is FLOSS, use it)
- CLAN (Carnegy Mellon University) (try it) -
- MOLTO (MT open source Aarne Raanta on GF) -
- CORD (corpora historical English) -
- TVÄRSLÅ (the scandinavian lexicon is accessible for single-word
The amount of work for installing GLOSSA here is noticeable, yes, but
The obt uses the Oslo fullform list for morphological analysis, with
In Bergen they are going to implement a separate corpus interface,
Summing up the sma-oahpa week
- Aajege var med 3 personer i Tromsø 11-15.10
- de har samlet 2500 ord til Oahpa-leksikonet fra lærebøker - nå unifisert med
- resultert i 900 nye lemmaer i smanob-dict for subst, verb og
- Aajege arbeider med Oahpa-lemmaene rett i smanob: forbedrer oversettinger
- vi må finne ut på hvilket nivå vi legger til oversettelser til andre språk i
- vi begynner å bli venner med XMLmind
- alfa-MorfaC, beta-MorfaS og beta-Leksa skal være ferdig i løpet
- en del nye funksjoner legges til i Leksa -> kommer også med i smeOahpa:
- pronomener
- oversettingskommentar
- nettordbok inkludert i Oahpa-grensesnitt
- pronomener
- neste arbeidsmøte er i Røros 22.11 (Trond og Lene drar dit), dessuten
Whitespace and empty element diffs:
- <book name="s4" /> + <book name="s4"/> - <stem class="trisyllabic"></stem> + <stem class="trisyllabic"/>
TODO:
- compile new version of ÅD (Ciprian)
- fix the XMLMind XML Editor whitespace issue (Sjur)