Samest Project Plan
Plan for the samest project, 2014 - 2016
FST
Estonian FST - 2013 - 2015
This comes before both Oahpa and MT implementation
- Revise the plamk fst or integrate it in the gt infra
- Degree of adjustment of fst (discuss: Jaak, Heli, Trond + Sjur, soon after nov 18th)
- Revision -- Tag adjustment
- Degree of adjustment of fst (discuss: Jaak, Heli, Trond + Sjur, soon after nov 18th)
- Goals: Ability to generate Oahpa, MT, Dictionaries
Võru FST - 2013 - 2015
- Oahpa quality: generating the pedagogical lexicon
- 70 stem classes, 2m to arrive at generating them
Work procedure
FST group now in the beginning, discussing both vro and est
Goal:
- Select, make and proofread yaml files for testing (est and vro)
- Teach fst writing
- Learning by doing
- Later on leaving FST writing to the primary linguists
MT
Extracting bilingual dictionaries
- From corpora
- sme (gt) + fin (gt, hy) + est (filosoft) lemmatisation software in place
- corpora in place? Opus is in place (more sme-fin? Bible, other stuff from gt corpus?)
- TODO1: Set up Giza++ and Moses and do the alignment (Zürich)
- TODO2: Revise output with UPLUG + redo the tuning + find the best machine output (1+1 m)
- TODO3: Do the manual revision of the bilingual dictionaries (1+1 m)
- sme (gt) + fin (gt, hy) + est (filosoft) lemmatisation software in place
- From existing dictionaries
- (fin-rus/eng + rus/eng-est etc) (tasks 2.-4. 1 m)
- From WordNet, and from Wiktionary
- (fin-rus/eng + rus/eng-est etc) (tasks 2.-4. 1 m)
Saami-Finnish MT
Alignment
Finnish-Estonian MT
The application said
- Evaluation of Oahpa by teachers and students - 2015-2016
- Publication of results at conferences - 2014 - 2016
Task order:
- start with alignment
- start with fst adjustment
- start with getting some dictionary, e.g. nob-est, est-nob (additional nicety)
Oahpa
- Setting up Estonian and Võro Oahpa - 2014-1
- Numra est, vro 2014-1
- Leksa - 2014-1,2
- Morfa-S - 2014-1,2
Morfa-C - 2015
We will also give Vasta and Sahka a go
vro oahpa
- set up 2014-1,
Prototypes of Morfa-S, Leksa, Numra: 2014-2
Morfa-s: fst to generate forms of vocabulary
Leksa: vocabulary + semantic markup - Use in courses, feedback, adjustment: 2014-3,4
Later: Morfa-c, further work
Teachers
- est: Ilona and others
- vro: Sulev is the teacher + collegues in Võro Institute + TÜ
Time schedule:
- 2013-4 Estonian fst (discussion, adjustment) , Võro fst: approaching oahpa quality
- 2014-1 Set up Oahpa for est, vro
- 2014-2 Work with Oahpa for est, vro; work with fsts
- 2014-3 Oahpa in courses: vro, est
- 2014-4 Oahpa in courses: vro, est
- 2015 It will be planned later
- 2016 It will be planned later
Resources:People, tasks, time allocated
Time - month schedule
| Task | 2013 | 2014 | 2015 | 2016 | Persons |
|---|---|---|---|---|---|
| Estonian FST | 0.5 | 3 | 2 | 2 | Jaak, Heli, Heiki |
| Võro FST | 0.5 | 3 | 3 | 3 | Sulev |
| CG | 0 | 0 | 2 | 2 | Tiina, Kadri |
| Oahpa L, N, M | 0 | 5 | 2 | 1 | Heli, teachers |
| Oahpa V, S | 0 | 0 | 2 | 3 | Heli, teachers |
| W Extr. fi-et | 0 | 3 | 0 | 0 | Mark Fishel, Kaarel Veskis, Katrin Tsepelina |
| W Extr. fi-se | 0 | 2 | 0 | 0 | To be decided |
| MT fin-est | 0 | 0 | 2 | 2 | - |
| MT fin-sme | 0 | 0 | 1 | 1 | - |
| MT fi CG | 0 | 1 | 0 | 0 | - |
| Total | 1 | 17 | 14 | 14 | 51+ |
- Weighting, tasks: EstFST + VroFST + Oahpa + MT + Admin = as above
- Weighting, years: 2014, 2015, 2016 = quite even
People
| Persons | 2013 | 2014 | 2015 | 2016 | Role |
|---|---|---|---|---|---|
| Heiki-Jaan Kaalep | 0 | 1.5 | 1 | 1 | leader, FST |
| Kaarel Veskis | 0 | 1.5 | 0 | 0 | Evaluation |
| Katrin Tsepelina | 0 | 1.5 | 0 | 0 | UPLUG setup |
| Mark Fishel | 0 | 0 | 0 | 0 | Corpus, giza++, moses setup |
| Sulev Iva | 0.5 | 3 | 3 | 3 | fst |
| Jaak Pruulmann-Vengerfeldt | 0.5 | 1.5 | 1 | 1 | fst |
| Tiina Puolakainen | 0 | 0 | 1 | 1 | cg |
| Kadri Muischnek | 0 | 0 | 2 | 2 | cg, MT |
| Tarmo Vaino | 0 | 1 | 1 | 1 | guru |
| Heli Uibo | x | 4 | 4 | 4 | Oahpa, FST |
| Teachers | 0 | 1 | 1 | 1 | Oahpa |
| Saami to be decided | 0 | 2 | 1 | 1 | W Extr, MT |
| Sum | 1 | 17 | 15 | 15 | 48 (cf. 51.2) |
Travel
Meetings
Startup in Tromsø in january/february 2014
Other expenses
Workshops
ICALL workshop on Oahpa-related things in cooperation with Tromsø ICALL group
Papers
Brainstorming:
- The "See what I have done" - paper
- Võru fst
- Estonian Oahpa
- MT comparision RBMT vs. SMT
- Võru fst
- Specific problems papers
- cross-language FST comparisons
- Papers based upon user logs
- Papers based upon specific problems during our work
- cross-language FST comparisons

