Transducer infrastructure

Technical maintenance

Overview

Installation and setup

Migrating to Git

New user overview

Installing XCode

MacPorts installation

Setup emacs

Setup bash

How to write a jspwiki document

Howto upgrade bugzilla

Howto upgrade to Leopard

Upgrade notes for Snow Leopard

Upgrade notes for OSX 10.9/Maverick

Howto install HFST3

Setting up Cygwin

Configuring a server

Install eXist testapp

Web page infra

Web interface

The cgi-bin environemt

Converting Docs To Markdown

Maintenance

Infra maintenance

How to merge template updates

Specification for comments as documentation

Transducer names

Makefile names and organisation

Web file names from the old infra

Maintaining SVN Ignores

How to add a new language

Maintenance using GitHub

GUT documentation

Application infra

Overview

Spellers

Compile spellers

Building Spelling Checkers

How To Configure And Optimise Spellers

Build and use weighted fst's as spell checkers

How To Control Compounding In Spellers

Setting up Voikko with HFST

Building MacVoikko

Speller Server

Hunspell howto

LibreOffice Voikko

Preparations and final steps

Building hfst-ospell for inclusion in Voikko

Building Voikko as a Static Universal Binary

Build a release version of Voikko+hfst oxt

LO-Voikko libraries

Servers, users, access

Presentations

Overview

Infra Presentation (BAULT)

The BAULT slides

Infrastructure Presentation (Edmonton)

Edmonton slides

Giellagáldu-møte 2017

Minority Language LT

Greenland 2017

Old documentation

Overview

Related to the old infra

Fst names in the old and new infra

Moving a language from old to new infra

Move to newinfra

Moving plx and Hunspell to the new infra

Project details

Plans

Languages

Overview

Saami languages

North Sámi

Lule Sámi

South Sámi

Inari Sámi

Kildin Sámi

Pite Sámi

Skolt Sámi

Finnic languages

Estonian 1

Overview

What is this?

Source documentation

File list

root

Estonian 2

Overview

What is this?

Source documentation

File list

root

phonology

Finnish

Ingrian

Overview

What is this?

Source documentation

root

Kven

Livonian

Overview

What is this?

Source documentation

root

Meänkieli

Overview

What is this?

Source documentation

root

Olonetsian

Overview

What is this?

Source documentation

root

Veps

Overview

What is this?

Source documentation

root

Võro

Overview

What is this?

File documentation

Source documentation

File list

root

phonology

Other Uralic lgs

Eastern Mari

Overview

What is this?

Source documentation

root

Erzya

Overview

What is this?

Source documentation

root

Khanty

Overview

What is this?

Source documentation

File list

root

Komi

Moksha

Overview

What is this?

Source documentation

root

Nganasan

Overview

What is this?

Source documentation

File list

root

phonology

Northern Mansi

Overview

What is this?

Source documentation

root

Selkup

Overview

What is this?

Source documentation

File list

root

Tundra Nenets

Overview

What is this?

Source documentation

root

Udmurt

Overview

What is this?

Source documentation

root

Western Mari

Overview

What is this?

Source documentation

root

North American lgs

Central Alaskan Yupik

Overview

What is this?

File documentation

Central Siberian Yupik

Overview

What is this?

File documentation

Cherokee

Overview

What is this?

File documentation

Dogrib

Overview

What is this?

File documentation

Greenlandic

Overview

What is this?

Source documentation

File list

root

nouns-stems

derivations-inflections

numerals-affixes

propernouns-affixes

Iñupiaq

Overview

What is this?

File documentation

Kiowa

Overview

What is this?

File documentation

Northern Haida

Overview

What is this?

Source documentation

root

Ojibwa

Overview

What is this?

File documentation

Ojibwe

Overview

What is this?

Source documentation

File list

root

phonology

Plains Cree

Overview

What is this?

Source documentation

root

Southern Puget Sound Salish

Overview

What is this?

File documentation

Tsuut’ina

Overview

What is this?

Source documentation

root

Upper Necaxa Totonac

Overview

What is this?

File documentation

Upper Tanana

Overview

What is this?

File documentation

Other languages

Bashkir

Overview

What is this?

File documentation

Buryaad

Chukchi

Overview

What is this?

File documentation

Cornish

Overview

What is this?

File documentation

Evenki

Overview

What is this?

File documentation

Faroese

Overview

What is this?

Source documentation

root

Irish

Overview

What is this?

File documentation

Kalderash Romani

Overview

What is this?

File documentation

Khalkha Mongolian

Overview

What is this?

File documentation

Khakhas

Overview

What is this?

File documentation

Latvian

Overview

What is this?

File documentation

Norwegian Bokmål

Overview

Background

Use

Romanian

Overview

What is this?

File documentation

Aromanian

Overview

What is this?

File documentation

Russian

Overview

What is this?

File documentation

Somali

Overview

What is this?

Source documentation

File list

root

nouns-stems

phonology

Klingon

Overview

What is this?

File documentation

Tuvan

Overview

What is this?

File documentation

Kalmyk

Overview

What is this?

File documentation

Todo Oirat

Overview

What is this?

File documentation

Common resources

Overview

Syntax tags

Dependency tags

Semantic Double Tagging of Names

Compounding tags for the spellers

Error Tags

Preprocess and lookup2cg

Normative fst-ar og stavekontrollar

Leksikalisering

Preprocess, lookup2cg, Apertium

Flag diacritics

Linguistics

Overview

Tutorials

Morfeme border markup

Tag standardisation

Preprocessing

Preprocessor

Allcaps

Foreign

Regular expressions

Morphological analysis

Derivational tags

Language Independent Tags

Disambiguation

Flowchart

Disambiguation

Writing disambiguation files

Testing

Testing lexical coverage

Testing the disambiguator

Corpus

Overview

Overview and important links

Corpus collection/maintenance

Korpussamlerens 1-2-3

Corpus collector's manual

Unicode normalisation

OCR

Wikipedia as corpus

Sentence alignment

Korp

Installations

Ordbilde

Overview

Plan for content

Spoken corpora

Overview

LIA

Overview

ELAN

Overview

ELAN documentation

Elan tiers

FSTs

GRAID

GT corpus

Metadata

TLA

Toolbox

Transcription

Machine translation

Overview

Publications

Apertium

Installing Apertium

Updating gtweb MT

OmegaT

OmegaT Dev Info

Meeting 7.6.2017

Language pairs

North Saami - Norwegian

North Saami - South Saami

North Saami - Inari Saami

North Saami - Lule Saami

Finnish - North Saami

Ttranslation memory

Overview

Linguistic analysis

Overview

Machine learning

Overview

Background

Localisation

Overview

Windows

Language Support And BCP47

Tools

Overview

Forrest documentation publishing

Menus and tabs

Forrest layout

Forrest DTD

Basic tools

Bug database

Bugzilla

Grammar tools

lookup

How to analyse

Testing

How to Use Voikko+HFST

lookup2cg

vislcg3

vislcg3 usage

linguistic commands

Commands for grammar checker developers

Conversion tools

Conversions

Windows tools

Intro

putty

TortoiseSVN

Dictionaries

ICALL

Keyboards

Overview

Designing keyboards

Language specific doc

Tips for keybord devlp

Compiling keyboards

Getting Started

Android keyboards

Linux keyboards

Icon design resources

Customising packeges

Build/install on phone

Plan for more keyboards

Proofing documentation

Proofing Overview

Testing of proofing tools

Release procedures

Admin

Release testing, Divvun 2.2

11.3

17.3

23.8

6.9

15.9

3.11

Presentations

Status for hfst-stavekontrollane (presentation)

Status for hfst-stavekontrollane (web page)

Spelling

Hyphenation

Overview

Meetings

Hyph meeting 05.11.2007

MS Office Hyphenation

Overview

Hyphenation in OpenOffice

Overview

How To Build Tex Hyphenators

Grammar checker

Nordplus-prosjektet

Prosjektoversikt

Presentasjon på torsdagsseminar

Prosjektadmin.

TTS documentation

eSpeak NG

Old Acapela project docs

Overview

Requirements And Specifications

Meetings

08.02.2012 - finalisation kickoff

10.09.2009 Project Kickoff

Subversion

Setup svn

Howto use SVN

Admin svn users

Old documentation

Overview

Related to the old infra

Fst names in the old and new infra

Moving a language from old to new infra

Move to newinfra

Moving plx and Hunspell to the new infra

Project details

Plans

giellalt@uit.no

Meeting_2015-03-10

Contents:

Abbr
Diskusjonen førre veka med Ciprian og Lene
prosjekt framover - status og oversikt

T&S-møte 10.3.

Saker:

Abbr
Diskusjonen førre veka med Ciprian og Lene
prosjekt framover - status og oversikt

Abbr

Støtte for alle språk no - men funkar ikkje (funkar berre for sme).

Det burde vera nok med (for xerox-basert abbr-generering):

./configure
make

Funkar berre for sme, hfst funkar for sma, ingenting for smj. Ei mogleg feilkjelde er at abbr.txt ligg i svn for sma og smj.

abbr.txt blir lagt i tools/preprocess/abbr.txt.

Trond har same problem som Sjur (testa med Hfst, ikkje Xerox).

Alternativ til abbr og preprocess

Vi vil over til fst-basert tokenisering og analyse. Det er no mogleg, men må testast og rettast. Kommando for ny preprosessering+analyse:

echo "text | hfst-proc2 --xerox \
tools/preprocess/tokeniser-disamb-gt-desc.pmhfst | cg-conv | l

Resultat med ulike opsjonar:

Direkte til CG-format (inneheld + og er dermed feil):

 echo "don" | hfst-proc2 --cg tools/preprocess/tokeniser-disamb-gt-desc.pmhfst 
"<don>"
	"dohte" Pron Dem Sg Ill Attr
	"dohte" Pron Dem Sg Gen

Xerox-analyseformat:

echo "datne leah dr. Bergsland." \
| hfst-proc2 --xerox tools/preprocess/tokeniser-disamb-gt-desc.pmhfst \
| cg-conv \
| vislcg3 -g src/syntax/disambiguation.cg3 

"<datne>"
	"datne" Pron Pers Sg2 Nom
"<leah>"
	"lea" V Ind Prs Sg2 @+FMAINV
"<dr.>"
	"dr" N ABBR Attr
"<Bergsland>"
	"Bergsland" N Prop Sem/Plc Sg Nom
"<.>"
	"." CLB

Diskusjonspunkt (neste veke?):

handteringa av samansette ord (lemmaform)
handteringa av avleiinga (stjerne vs underlesingar)

Arbeidet med fst-basert tokenisering

Arbeidet må ordnast stegvis, og testast for kvart steg:

Steg 1)

cat tekst | preprocess --abbr=tools/preprocess/abbr.txt 
cat tekst | hfst-proc2 tools/preprocess/tokeniser-disamb-gt-desc.pmhfst | grep -v '^$'

Steg 2)

 hfst-proc2 tools/preprocess/tokeniser-disamb-gt-desc.pmhfst 
 analyser-disamb-gt-desc.*fst

Steg 3)

Dette blir eit arbeidspunkt: Viss vi skal unngå lookup2cg må det språkspesifikke innhaldet i den fila flyttast inn i fst-en.

 cg-conv
 lookup2cg

Steg 4)

Det kan henda at nokre endringar i andre komponentar krev endringar i disambigueringsfila. Dette må i så fall testast òg.

Gullkorpus for sme ligg i test/ (spør Lene)

Møte neste veke om arbeidspunkta: tysdag kl 9.30 (forslag, sjekk med andre). Deltakarar: Lene, Linda, Sjur, Trond.

Diskusjonen førre veka med Ciprian og Lene

ciprian har ikkje (nok) tid til ny infra
trond gjorde ting utan å sjekka alle konsekvensar -> jobben vart halvgjort

Vi analyserte dei noverande skripta, Trond deltek i diskusjonen pr e-post.

prosjekt framover - status og oversikt

Vi kjem attende til denne saka.