141007

SamEst meeting 07.10.2014

Present: Fran, Heiki, Heli, Sjur, Trond

FST

Jaak checked in a todo list in langs/est/doc/.

Tag conversion

  • Proper gt-style tagging of components (+Cmpnd-stuff)
  • Ask HJK for improved verb paradigm -- after a month...?
  • Decide how to (not to?) encode "defaults"

Improvements waiting to happen

  • proper gt-style punctuation in FST
  • stress information in FST Stress information in the lexicon will simplify many (two-level?) rules a lot and make the network of continuation lexica more logical, but we can do without as well.

fin-est MT

http: //wiki.apertium.org/wiki/Finnish_and_Estonian#Tagset_stuff Procedure: Do this for 1 word from each continuation class for open classes. For the closed classes, all. E.g. You expand "minä" "sinä" etc. This can be done using hfst-intersection and a regex like "t a l o <n> ?*" 1) Expand a single word form, e.g. "talo"

talo<n><sg><nom>
talo<n><sg><ine>
talo<n><sg><gen> 
...

2) Replace the lemma with the Estonian:

maja<n><sg><nom>
maja<n><sg><ine>
maja<n><sg><gen> 
...

3) Try and generate. See where there are generation errors. Heiki + Jaak to start on it.

Bidix

eng-fin wordnet, eng-est X composition: Reliable: (eng-)fin-est 1-1-1 Less reliable:

  • (eng-)fin-est 1-1-m /
  • (eng-)fin-est 1-m-1 Word triplets ordered by the frequency of the English word. The Reliable List will be proofread at the end of this week. The worker writes YES and NO
  1. YES from reliable
  2. YES from unreliable
  3. MAYBE from reliable
  4. MAYBE from unreliable https: //svn.code.sf.net/p/apertium/svn/incubator/apertium-fin-est https: //svn.code.sf.net/p/apertium/svn/incubator/apertium-fin-est/dev/bidix/reliable.fin-est.n 6230 dev/bidix/reliable.fin-est.n 1692 dev/bidix/reliable.fin-est.adj 668 dev/bidix/reliable.fin-est.adv 188 dev/bidix/reliable.fin-est.vblex

16371 dev/bidix/unreliable.fin-est.n 6354 dev/bidix/unreliable.fin-est.adj 1762 dev/bidix/unreliable.fin-est.adv 1590 dev/bidix/unreliable.fin-est.vblex

34855 total

Cooperation with teachers

Heli and Kadri met some teachers of Estonian the day after the SamEst meeting in Tartu. Teachers will cooperate and send material about stuff to put into Oahpa (raw material for Morfa-C templates, new exercise types in Morfa-C). User interface for Leksa, Morfa-S and Morfa-C was also discussed.

Paper work

Remember to send in your tickets and receipts for travel reimbursements!

The next meeting

In two weeks - Tuesday, 21th Oct at 12.00 Norwegian time.