160105

Meeting 5.1.16 Hangouts: Sandra, Trond, Lene

Áššit

  • Bidix improvement
  • Progress
  • Next meeting

Bidix improvement

We will improve bidix by:

  • frequency sorting: Ciprian
  • checking sme lemmas: Lene
  • checking smj lemmas:
    • in dev: sh bidix-sanity.sh > sanityoutput (Sandra)
    • if same structure in sme and smj, one can remove the word pair instead of lexicalizing it
  • adding word pairs from Lenes giza-experiments in 2008: Lene makes a csv-file, Sandra proofread
  • checking sme verbs IV vs TV: done
  • change back to Po, Pr and Adv in smj-lexc
  • add transfer word pairs from termlists (sme-nob + nob-smj) and compare them to bidix:
    • Give priority to sme-nob/nob-smj (where nob-smj = Kintel) pairs that are already in bidix, and lift they over in a priority bidix list
    • evaluate sme-smj-bidix not in sme-nob/nob-smj and sme-nob/nob-smj not in sme-smj-bidix separately, after the seeded gang
    • For the sme-smj-bidix residue: Give priority to sme-smj pairs where smj are already in FST
  • make missing list from sme-corpus: Trond

Progress

Pronouns

There are two files with sentences with pronouns in texts/ for comparing and improving word pairs in bidix and harmonising tags in FST

Transfer rules

We will utilize on work done with other sme-smX pairs by

  • using the same names on variabels and sets
  • document relevant rules in a document on smX MT web-page
  • suggest for Francis to hold regular meetings for discussing transfer rules, and improvements

Next meeting

Friday 8.1 kl 13 (Sandra and Lene)

We'll have meeting once a week (but next meeting will be the last week of January)