140605
SamEst meeting, Jun 5, 2014
Present: Heli, Heiki, Jaak, Neeme, Sjur, Trond, Frann
Agenda
- Status
- Plan
- Plan a physical meeting
- Dictionary
- Next meeting
Status
fst
Not happened as much as planned. No reports, but activity, yes.
Implemented plamks compound to hfst, but not ported it for xfst
Abusing the build system of gt/d in this, thus the issue needs
24 words of Heli's csv file were not analysed:
Ida-Eesti Lõuna-Eesti Lääne-Eesti Põhja-Eesti Hispaania Inglismaa Itaalia Prantsusmaa Ungari maantee puiestee ärge kolmapäev teisipäev võib-olla missugune mõnikord pesema medõde kohupiim seekord mitmesugune triikraud muinasjutt
xfst's side has some problems with two level rules (i guess) for double š and double f at word ends, for some reason.
Estonian Oahpa
Oahpa source files are in ped/est.
The lexicon contains words from the textbook "E nagu Eesti". Reverse lexicons (eng-est, fin-est, rus-est, deu-est) also exist.
Oahpa itself is not online yet, as the est_oahpa folder is
Plan
fst
Jaak to go on.
Workshop in late june when everyone is in Tartu.
Oahpa
The fst should generate Oahpa now (for demoing at the end of June).
- Minimize list of failures
- More words into the fst
What exactly has to be done, to use fst in oahpa?
Heli: It is good enough for using it for the first demo. For me it is important that the FST in langs/est builds and gives me the xfst file that analyses/generates most of the words in the lexicon. And it is so.
Status for oahpa words in fst: 339 of 1529 words do not get an analysis.
cd est ./autogen.sh ./configure --enable-oahpa
Tag conversion
the infra is ready, but made for converting from gt/d to X, not
We would like to have the estonian fst using the gt/d tags, and then
Path to file / documentation:
src/tagsets/*
There is no proper documentation for the tagset conversion as of yet... TBW.
Taglist: The taglist is in est/src/morphology/root.lexc
Reversing the taglist table it should be possible to generate plamk-tagged analysers.
TODO:
- Take the tags out of root, make a list in langs/est/doc/ (Trond)
- tag TAB tag TAB comment
-
https://gtsvn.uit.no/langtech/trunk/langs/est/doc/taglist.txt
- tag TAB tag TAB comment
- Start adding tag candidates to the second column from fin (all)
look at:
CG
As before:
- CG license issue: Trond, Fran in June
- Tag filters: input taglist: root.lexc Trond, Fran
Plan a physical meeting
About what? Where? When? Who?
- Topics: The usual.
- Where: Helsinki or Tartu
- When: Early, actually. October? September if Trond can make it : )
- Who: Us + possibly HFST people.
TODO: We all to look at calendars, and return to issue on next meeting.
Dictionary
Heiki to send mail.
Can it be used for other purposes? Is it open source? What license?
Heiki: The agreement has been reached via e-mails, so it is rather informal, but still it is in written form. I asked for a permission to use it in this project.
Sjur: could you ask for license and possibility for other uses?
Next meeting
June 26th 1300 Norwegian time. (Because of Trond possibly 13: 15)