161207
7.12.16
Agenda
- MT
- Võru FST
- Papers/articles
- Next meeting
Estonian FST
Status
Heiki: Consider expest.fst also a documentation for grammar work,
Jaak: Continued getting rid of multiple generations.
Filter is in src/filters/
Then, to use the filter when building the Oahpa fst, add a target for analyser-oahpa-desc.%:analyser-oahpa-desc.tmp.%, and specify how to use the filter in that target.
Fst build process:
1: build *-raw-* fst, containing everything with tags in a known order 2: build *.tmp.?fst from the *-raw-* fst (language independent build step) 3: build the final fst from the *.tmp.?fst file (language-specific build step, ie the place to add filters only relevant to estonian); if there are no build steps here, the tmp file is just copied to the final target file
Documentation for namning convetion:
/infra/infraremake/TransducerNamesInTheNewInfra.html
Testing
- Coverage of large corpus
- Result of generation
- Number of double generations
- Number of double generations
- common lemmatest: Find a lemmalist and test
- generate the lemma forms of the biggest (or most relevant) est bidix
- Eventually some other lemmalist
- generate the lemma forms of the biggest (or most relevant) est bidix
- different lemmatest: make check
- In this test we measure each fst against its own lemmalist
- 70
- In this test we measure each fst against its own lemmalist
- Other?
ACTION POINTS
- Jaak sets up tests, and Jaak and Heiki (etc) to test
Documentation
Heiki-Jaan and Sjur have discussed the jspwiki generated documentation
MT
est -> fin
Tänases Postimehes langevad viimased maskid: idarahaskandaalist alates on viimased kuus aastat jäetud avalikkusele mulje, et Keskerakonna ja Ühtse Venemaa suhted seisavad külmutatuna.
- Tänases *Postimehes lankeavat #uusin *maskid:
- idarahaskandaalist #alkaa ovat #uusin kuusi vuotta *jäetud julkisuudelle *mulje, että #Keskeinen puolueen ja *Ühtse Venäjän maan *suhted seisovat *külmutatuna.
document generation issues Heiki had and
X [0] root-morphology.html BROKEN: Couldn't accept input link ["[ue]"]
this indicates that
[ ]
was used in the documentation to be generated (following !! ), and it should not have been.
Võru FST
Status
Not much happened lately. Problem: Identifying A vs N from a 6000
Papers/articles
MT
- typology papers (typology of MT systems, that is)
- ordinary MT system presentation papers
- sme-fin is a good candidate
Oahpa
- Systems in use papers
FST
- heiki vs. jaak paper (subtracting or not)
- presentation FST papers
- lexc vs twolc: where to put the complexity
Grammar
CG
ACTION POINT
- Push this to a more concrete level to the next meeting (all)
- Articles as issue next time
Summary of the Helsinki trip
Heiki presented Hki slides, and still has had no lunch.
http: //blogs.helsinki.fi/language-technology/news/bault-2016/
Next meeting
Monday Dec 19th 13.30 Norwegian time