140116

Meeting Jan 16

Participants: Antti, Conor, Trond - later: Sjur

Issues

Presentation

  • web page
    • Online analyser
    • Form generator
    • Paradigm generator
  • pikiskwewina
    • as dictionary and click-in-text
  • Spellchecker
    • for the same text
  • learning tools based on Saami slides

syllabics conversion

Goal:

  • to have two spellcheckers (or one spellchecker understanding both)
  • Romanian as Roman
  • Bulgarian as Syllabics
  • Syllabics conversion online
  • demo syllabics

TODO

Trond to:

  • Send a list of relevant Oahpa presentations (Lene)
  • Look at cgi-bin: conversion to/from syllabics
  • look at the wrapped syllabic fst

Sjur to:

  • look at the wrapped syllabic fst
  • Take a trip to Bulgaria

Conor to:

  • Add words, look at the linguistics

Speller

How to have a speller

go to langs/crk, and issue these commands:

./configure --with-hfst --enable-spellers
make
sudo make install
  • Download LibreOffice if you do not have it already
    (must be version 4.1 or newer).
  • download & install oxt (Mac version):
    http://divvun.no/static_files/voikko-2013-06-26.oxt
    • restart LibreOffice if it was already open
  • Open a new text document, write some text, select all
    • e.g. in the lower margin of your document it says: Default Style US English
  • Click on the language name, and choose More...
  • set the language to Romanian (or soon: Bulgarian). Romanian (and Bulgarian) should have a blue "ABC" in front of it in the language drop down menu).

To update the speller:

  • add a new noun
  • make
  • sudo make install
  • restart LibreOffice

To repeat this on Windows:

  • Download and install LibreOffice (at least 4.1)
  • download and install the Windows beta oxt: http://www.puimula.org/htp/testing/voikko-sma-fi.oxt
  • build the speller on a mac or linux machine, final file can be found in: crk/tools/spellcheckers/fstbased/hfst/crk.zhfst
  • copy this zhfst file to the Windows machine, place it in: to-be-added

Text for the presentation

http://www.ualberta.ca/~arppe/PlainsCree.html

The text:

nitêminân nipâw sisone iskwatemihk.
dog+N+Pl1Ex sleeps+V beside+PART door+N+LOC

waniskâw kîkisepâ.
wakes.up+V morning+N 

waniskâw ekwa nohtekatew.
wakes.up+V and+PART is.hungry+V

wâpahtam ôskanisis wiyâkanihk ekwa mîciw.
sees+V bone+N+DIM bowl+N+LOC and+PART eats+V

ekota-ohci nôhkwâtam wiyâkan.
here.from+PART licks+V bowl+N

keyâpic nohtekatew ekwa kâwe nipâw.
still+PART is.hungry+V and+PART again+PART sleeps+V

Issues to fix:

  • waniskâw
    • Trond to fix the waniskâw x 14 (due to Trond's flag diacritic experiments).
  • nohtekatew nohtekatew +?
    • VAI, to be added
  • nôhkwâtam nôhkwâtam +? Conor: Check with Dorothy
    • VTA ??
  • mîciw mîciw +?
    • VTA to be added
  • ekota-ohci ekota-ohci +?

Diminutives

Working with a productive analyser for now.

yaml files

Conor has checked files and added verb types. Different verb types are to be checked in.

Verbs

Subjunctive prefix ê-

In writing:

  • Either: always ê-
  • Or: êh in front of vowels, no mark (thus, just an ê) in front of consonants

Solution:

Two prefixes in lexc:

  • ê-
  • êh^%eh

and then to h deletion before %^eh in twolc

The tag is now +Sbj, an alternative is +Conjunct mode, +Cnj, so we could do that.

Decided:Use +Cnj

Dictionary

More words in the dictionary, especially the words of the text. Input here: comma separated stuff to crkeng/inc/:

main/words/langs/crkeng/inc/nouns.csv

cat inc/nouns.csv 
atim	n	dog
inini	n	man	
nâpês	n	boy	
apiscacihkos	n	antelope
mâyatihk	n	bighorn sheep
atihk	n	caribou
apisimôsnos	n	deer
wâwaskêsniw	n	elk
mistatim	n	horse
môswa	n	moose
maskwa	n	bear
okistatowân	n	grizzly bear
wâpask	n	polar bear
sîsîp	n	duck
môhkomân	n	knife
sakâw	n	soup
mîcimâpo	n	soup
sîsîpâwi	n	duck egg
wâwi	n	egg
iskwêw	n	woman
nipâw	v	sleep
wâpam	v	see

  <e>
    <lg>
      <l pos="N">sakâw</l>
    </lg>
    <mg>
      <tg>
        <t pos="N">soup</t>
      </tg>
    </mg>
  </e>

Results to be added to

main/words/langs/crkeng/src/N_crkeng.xml

Procedure for updating the dictionary (needed: account on the gtweb machine):

/dicts/nds/NDSUpdatingDictionaries.html

(with pikiskwewina or guusaaw as the variable for DICT)

  1. Log in to the server via SSH as the NDS user
    in the following way:
    ssh neahtta@gtweb.uit.no
    with a password
  2. Then follow the instructions on the page

Length on ê

Many writers do not write ê. The analyser handles both, but we should consistently always write e.g. with macron in the code.