170113

Samest meeting 13.1.2017

Participants: Heiki, Heli, Jaak, Jack, Trond

Agenda

  • Status reports
  • Jorgal
  • Final conference of the project

Status reports

Heiki will make a presentation at IWCLUL in St. Petersburg. Meanwhile, he has also worked on the two-level rules.

Jaak has started to work on parallel forms. There are some hundreds of words with parallel forms.

E.g.

väär+A+Sg+Par 
väära, väärat

väärat is not a correct word, according to estmorf.

vaba+A+Pl+Par
vabasid, vabu 

Both are correct. Jaak's FST does not contain tags +Use/Rare etc

exp-langs:

echo 'vabasid' | hfst-lookup analyser-gt-desc.hfstol 
> vabasid        vaba+A+Pl+Par+Use/Rare

    
vaba+A+Pl+Par
vaba+A+Pl+Par        vabu

For Saami we have:

  • generator-gt-desc.xfst: gives all
  • generator-gt-norm.xfst: gives only normative

Tag-wise, sme has:

  • no usage tag: always ok (-norm., -desc)
    • Both no tag and ...
  • Use/NG: in speller, not in MT output (-norm, -desc)
    • (= Use/Rare)
  • Err/Orth: understood, but neither speller-accepted nor >MT-output (-desc)

est:

+Use/Rare !!= * {{@CODE@}}:       ! norm, but rare: puusid:puu+Pl+Par+Use/Rare
+Use/Hyp !!= * {{@CODE@}}:        ! norm, but so rare that norm is probaly wrong: tiivasse:tiib+Sg+Ill+Use/Hyp
+Use/NotNorm !!= * {{@CODE@}}:    ! not norm, but sometimes used: pöidlates:pöial+Pl+Ine+Use/NotNorm
+Use/CommonNotNorm !!= * {{@CODE@}}: ! not norm, and used more than norm: peeneid:peen+Pl+Par+Use/CommonNotNorm


+Use/Rare = +Use/NG (accept, but do not tell anyone)
+Use/Hyp = +Use/NG (accept, but do not tell anyone)
+Use/NotNorm = Err/Orth
+Use/CommonNotNorm = Err/Orth

Cf a corresponding page for North Saami:

/lang/sme/KompilereFST.html

Trond created a link to the documentation of exp-langs/est.

What is needed for MT:

  • MT-generation (fin2est) needs to output one form
  • MT-analysis (est2fin) needs to accept both forms
  • speller wants to accept the normative

1st thing for Heiki to do:

write a script that removes usage tags from some words (list sent by Heli).

Testing of the FSTs

We have hand-disambiguated corpora which we can use for testing.

Jorgal

sme-nob would like to keep the jorgal page for working languages only (that means exit fin-sme, smn-sme, smj-sme, sma-sme, fin-est, est-fin).

Suggestion:

  • gtweb.uit.no/jorgal for sme2X only
  • gtweb.uit.no/mt/testing for the full list (but no web translation)
  • A page for Finnic MT, called, hmm
    • gtweb.uit.no/mt/
    • gtweb.uit.no/kaantaminen
    • gtweb.uit.no/tolkimine
    • something even more (less?) nifty
  • evt., the gtweb.uit.no/mt/ could be a jorgal-type all-in page (?)

Final conference

Project end: the final conference of the Norwegian-Estonian scientific cooperation program will take place on 21.–22.09.2017 in Laulasmaa http://www.laulasmaa.ee/en/.

We will be there, and give an interesting presentation.

Next meeting

Tuesday 14 February 10:00 Norwegian time