170313
Contents:
Samest meeting 13.03.2017
Participants: Heiki-Jaan, Heli, Jaak, Jack, Kadri, Trond, Sjur
Agenda
- ICALL
- status
- papers
- status
- FSTs
- status
- papers
- status
- MT
- status
- papers
- status
- Next meeting
ICALL
Status
Võro Oahpa transducers rebuilt and database updated. hfst transducers are being used instead of xfst transducers because of the compilation error of the xfst transducers.
The xfst compiler reports a syntax error in the row:
Papers
A paper about Võro Oahpa to Nodalida NLP4CALL workshop in progress.
The Estonian Oahpa is not quite there yet.
FSTs
Status
vro sh test/yaml-check.sh
SUMMARY for the gt-norm fst(s): PASSES: 13226 / FAILS: 56 / TOTAL: 13282
langs/est -- it appears that tests and fst are drifted apart and there are both errors in fst and things tested that do not exist as such in fst, yet. Some of the errors have been corrected since the last meeting but there is more to do. Also, there are some apertium tests that are not adjusted to Estonian.
langs/est/test/tools/mt/apertium/apertiumtests_mt-gt-desc.ana.yaml
TODO to next meeting: Make the apart-drifting components coming closer.
Papers
Open for now.
MT
Status
Starting point
Test on 23500 words Juhani Aho "Rautatie"
fin-sme produce/understand coverage 86 % fin-est produce/understand coverage 82 % but a 'better than G' impression sme-fin produce/understand Not that bad (ipmressionistic) est-fin produce/understand ??
fin-est
fin:
46 prosentissa kirjeistä kirjoittaja myönsi suoraan ja avoimesti, että kaupungin suunnitelmat tökkivät, koska ne asettuvat vastakkain hänen oman etunsa kanssa.
est:
46 protsendis kirjadest kirjutaja nõustus otse ja varjamatult, et linna plaanid tökkivät, kuna need asetuvad vastakkain ta oma eelise koos.
The problem here is:
a rule looks at adj noun sequence, finds oman etunsa
Move the "etunsa kanssa" rule earlier than the A N rule in the t1x file.
my current proposal for a solution: make more interchunk transfer stages, i.e rule files, so that different rules look at different patterns one after another, and do not care if these overlap.
- chunker
- interchunk1
- interchunk2
- postchunk
fin2X
Finnish text in Estonian:
The same Finnish text in North Saami
fin2est:
- Main problem grammatical (transfer)
fin2sme:
- Main problem technical (tags in the target)
- Then bad vocabulary coverage
- How bad the transfer is is early to tell
Papers
Papers should investigate the production/understanding difference.
We should be able to conclude that with a decent lexicon, L1 syntax and morphology, we will get good understanding even by a bad transfer component (since there are free rides).
We should pursue the good fin2est results, and understand them.
Understanding would be easier to measure for est2fin vs. sme2fin (we would have two Finnish texts to look at)
fin2X
- Spring
- Fix technical things (Trond, Heiki-Jaan)
- Look at lexical coverage
- Improve coverage
- Fix technical things (Trond, Heiki-Jaan)
- Evaluation: Look at ways to evaluate understanding
- Analyse and write a paper before august
On evaluation:
- Cf. work on forthcoming sme-nob article
- smt uses BLEU (how close are you to a predefined mark)
- RBMT likes WER (how much must you change output before you are satisfied)
Next meeting
Next meeting: 27.3.2017 at 11 am Norwegian time.