Samest Project Plan

Plan for the samest project, 2014 - 2016

FST

Estonian FST - 2013 - 2015

This comes before both Oahpa and MT implementation

  • Revise the plamk fst or integrate it in the gt infra
    • Degree of adjustment of fst (discuss: Jaak, Heli, Trond + Sjur, soon after nov 18th)
    • Revision -- Tag adjustment
  • Goals: Ability to generate Oahpa, MT, Dictionaries

Võru FST - 2013 - 2015

  • Oahpa quality: generating the pedagogical lexicon
    • 70 stem classes, 2m to arrive at generating them

Work procedure

FST group now in the beginning, discussing both vro and est (Sulev, Jack, Jaak, Neeme, Heli, Trond, Heiki).

Goal:

  • Select, make and proofread yaml files for testing (est and vro)
  • Teach fst writing
  • Learning by doing
  • Later on leaving FST writing to the primary linguists

MT

Extracting bilingual dictionaries

  1. From corpora
    1. sme (gt) + fin (gt, hy) + est (filosoft) lemmatisation software in place
    2. corpora in place? Opus is in place (more sme-fin? Bible, other stuff from gt corpus?)
    3. TODO1: Set up Giza++ and Moses and do the alignment (Zürich)
    4. TODO2: Revise output with UPLUG + redo the tuning + find the best machine output (1+1 m)
    5. TODO3: Do the manual revision of the bilingual dictionaries (1+1 m)
  2. From existing dictionaries
    1. (fin-rus/eng + rus/eng-est etc) (tasks 2.-4. 1 m)
    2. From WordNet, and from Wiktionary

Saami-Finnish MT

Alignment

Finnish-Estonian MT

The application said

  • Evaluation of Oahpa by teachers and students - 2015-2016
  • Publication of results at conferences - 2014 - 2016

Task order:

  1. start with alignment
  2. start with fst adjustment
  3. start with getting some dictionary, e.g. nob-est, est-nob (additional nicety)

Oahpa

  • Setting up Estonian and Võro Oahpa - 2014-1
  • Numra est, vro 2014-1
  • Leksa - 2014-1,2
  • Morfa-S - 2014-1,2

Morfa-C - 2015

We will also give Vasta and Sahka a go

vro oahpa

  1. set up 2014-1,
    Prototypes of Morfa-S, Leksa, Numra: 2014-2
    Morfa-s: fst to generate forms of vocabulary
    Leksa: vocabulary + semantic markup
  2. Use in courses, feedback, adjustment: 2014-3,4

Later: Morfa-c, further work

Teachers

  • est: Ilona and others
  • vro: Sulev is the teacher + collegues in Võro Institute + TÜ

Time schedule:

  • 2013-4 Estonian fst (discussion, adjustment) , Võro fst: approaching oahpa quality
  • 2014-1 Set up Oahpa for est, vro
  • 2014-2 Work with Oahpa for est, vro; work with fsts
  • 2014-3 Oahpa in courses: vro, est
  • 2014-4 Oahpa in courses: vro, est
  • 2015 It will be planned later
  • 2016 It will be planned later

Resources:People, tasks, time allocated

Time - month schedule

Task 2013 2014 2015 2016 Persons
Estonian FST 0.5 3 2 2 Jaak, Heli, Heiki
Võro FST 0.5 3 3 3 Sulev
CG 0 0 2 2 Tiina, Kadri
Oahpa L, N, M 0 5 2 1 Heli, teachers
Oahpa V, S 0 0 2 3 Heli, teachers
W Extr. fi-et 0 3 0 0 Mark Fishel, Kaarel Veskis, Katrin Tsepelina
W Extr. fi-se 0 2 0 0 To be decided
MT fin-est 0 0 2 2 -
MT fin-sme 0 0 1 1 -
MT fi CG 0 1 0 0 -
Total 1 17 14 14 51+
  • Weighting, tasks: EstFST + VroFST + Oahpa + MT + Admin = as above
  • Weighting, years: 2014, 2015, 2016 = quite even

People

Persons 2013 2014 2015 2016 Role
Heiki-Jaan Kaalep 0 1.5 1 1 leader, FST
Kaarel Veskis 0 1.5 0 0 Evaluation
Katrin Tsepelina 0 1.5 0 0 UPLUG setup
Mark Fishel 0 0 0 0 Corpus, giza++, moses setup
Sulev Iva 0.5 3 3 3 fst
Jaak Pruulmann-Vengerfeldt 0.5 1.5 1 1 fst
Tiina Puolakainen 0 0 1 1 cg
Kadri Muischnek 0 0 2 2 cg, MT
Tarmo Vaino 0 1 1 1 guru
Heli Uibo x 4 4 4 Oahpa, FST
Teachers 0 1 1 1 Oahpa
Saami to be decided 0 2 1 1 W Extr, MT
Sum 1 17 15 15 48 (cf. 51.2)

Travel

Meetings

Startup in Tromsø in january/february 2014 Meetings in Tartu Mid-project in

Other expenses Laptops 2000 3 6000 Travel 2000 13 31200 Conferences 5000 4 24000 61200

Workshops

ICALL workshop on Oahpa-related things in cooperation with Tromsø ICALL group

Papers

Brainstorming:

  • The "See what I have done" - paper
    • Võru fst
    • Estonian Oahpa
    • MT comparision RBMT vs. SMT
  • Specific problems papers
    • cross-language FST comparisons
    • Papers based upon user logs
    • Papers based upon specific problems during our work