Samest Project Plan
Plan for the samest project, 2014 - 2016
FST
Estonian FST - 2013 - 2015
This comes before both Oahpa and MT implementation
- Revise the plamk fst or integrate it in the gt infra
- Degree of adjustment of fst (discuss: Jaak, Heli, Trond + Sjur, soon after nov 18th)
- Revision -- Tag adjustment
- Degree of adjustment of fst (discuss: Jaak, Heli, Trond + Sjur, soon after nov 18th)
- Goals: Ability to generate Oahpa, MT, Dictionaries
Võru FST - 2013 - 2015
- Oahpa quality: generating the pedagogical lexicon
- 70 stem classes, 2m to arrive at generating them
Work procedure
FST group now in the beginning, discussing both vro and est
Goal:
- Select, make and proofread yaml files for testing (est and vro)
- Teach fst writing
- Learning by doing
- Later on leaving FST writing to the primary linguists
MT
Extracting bilingual dictionaries
- From corpora
- sme (gt) + fin (gt, hy) + est (filosoft) lemmatisation software in place
- corpora in place? Opus is in place (more sme-fin? Bible, other stuff from gt corpus?)
- TODO1: Set up Giza++ and Moses and do the alignment (Zürich)
- TODO2: Revise output with UPLUG + redo the tuning + find the best machine output (1+1 m)
- TODO3: Do the manual revision of the bilingual dictionaries (1+1 m)
- sme (gt) + fin (gt, hy) + est (filosoft) lemmatisation software in place
- From existing dictionaries
- (fin-rus/eng + rus/eng-est etc) (tasks 2.-4. 1 m)
- From WordNet, and from Wiktionary
- (fin-rus/eng + rus/eng-est etc) (tasks 2.-4. 1 m)
Saami-Finnish MT
Alignment
Finnish-Estonian MT
The application said
- Evaluation of Oahpa by teachers and students - 2015-2016
- Publication of results at conferences - 2014 - 2016
Task order:
- start with alignment
- start with fst adjustment
- start with getting some dictionary, e.g. nob-est, est-nob (additional nicety)
Oahpa
- Setting up Estonian and Võro Oahpa - 2014-1
- Numra est, vro 2014-1
- Leksa - 2014-1,2
- Morfa-S - 2014-1,2
Morfa-C - 2015
We will also give Vasta and Sahka a go
vro oahpa
- set up 2014-1,
- Use in courses, feedback, adjustment: 2014-3,4
Later: Morfa-c, further work
Teachers
- est: Ilona and others
- vro: Sulev is the teacher + collegues in Võro Institute + TÜ
Time schedule:
- 2013-4 Estonian fst (discussion, adjustment) , Võro fst: approaching oahpa quality
- 2014-1 Set up Oahpa for est, vro
- 2014-2 Work with Oahpa for est, vro; work with fsts
- 2014-3 Oahpa in courses: vro, est
- 2014-4 Oahpa in courses: vro, est
- 2015 It will be planned later
- 2016 It will be planned later
Resources:People, tasks, time allocated
Time - month schedule
Task | 2013 | 2014 | 2015 | 2016 | Persons |
---|---|---|---|---|---|
Estonian FST | 0.5 | 3 | 2 | 2 | Jaak, Heli, Heiki |
Võro FST | 0.5 | 3 | 3 | 3 | Sulev |
CG | 0 | 0 | 2 | 2 | Tiina, Kadri |
Oahpa L, N, M | 0 | 5 | 2 | 1 | Heli, teachers |
Oahpa V, S | 0 | 0 | 2 | 3 | Heli, teachers |
W Extr. fi-et | 0 | 3 | 0 | 0 | Mark Fishel, Kaarel Veskis, Katrin Tsepelina |
W Extr. fi-se | 0 | 2 | 0 | 0 | To be decided |
MT fin-est | 0 | 0 | 2 | 2 | - |
MT fin-sme | 0 | 0 | 1 | 1 | - |
MT fi CG | 0 | 1 | 0 | 0 | - |
Total | 1 | 17 | 14 | 14 | 51+ |
- Weighting, tasks: EstFST + VroFST + Oahpa + MT + Admin = as above
- Weighting, years: 2014, 2015, 2016 = quite even
People
Persons | 2013 | 2014 | 2015 | 2016 | Role |
---|---|---|---|---|---|
Heiki-Jaan Kaalep | 0 | 1.5 | 1 | 1 | leader, FST |
Kaarel Veskis | 0 | 1.5 | 0 | 0 | Evaluation |
Katrin Tsepelina | 0 | 1.5 | 0 | 0 | UPLUG setup |
Mark Fishel | 0 | 0 | 0 | 0 | Corpus, giza++, moses setup |
Sulev Iva | 0.5 | 3 | 3 | 3 | fst |
Jaak Pruulmann-Vengerfeldt | 0.5 | 1.5 | 1 | 1 | fst |
Tiina Puolakainen | 0 | 0 | 1 | 1 | cg |
Kadri Muischnek | 0 | 0 | 2 | 2 | cg, MT |
Tarmo Vaino | 0 | 1 | 1 | 1 | guru |
Heli Uibo | x | 4 | 4 | 4 | Oahpa, FST |
Teachers | 0 | 1 | 1 | 1 | Oahpa |
Saami to be decided | 0 | 2 | 1 | 1 | W Extr, MT |
Sum | 1 | 17 | 15 | 15 | 48 (cf. 51.2) |
Travel
Meetings
Startup in Tromsø in january/february 2014
Other expenses
Workshops
ICALL workshop on Oahpa-related things in cooperation with Tromsø ICALL group
Papers
Brainstorming:
- The "See what I have done" - paper
- Võru fst
- Estonian Oahpa
- MT comparision RBMT vs. SMT
- Võru fst
- Specific problems papers
- cross-language FST comparisons
- Papers based upon user logs
- Papers based upon specific problems during our work
- cross-language FST comparisons