tromso-2006-08-lexc2xspell
Plan for common conversion from LexC to speller engines of Aspell type
Three different speller engines
- Polderland 
- Aspell 
- OOo speller => HunSpell
Common features and properties:
- not lexc-compatible => they require converting from lexc to native/whatever 
- basically list-based, with compounding and morphology (Aspell has no compounding)
- => similar expressive power 
- takes surface forms as input
Because of the similarities:
- one conversion "engine"/script
- several output formats
Output format
varies according to engine, but is basically a full-form word list that can be 
Information to be added:
- "inflection" tags: munching of fullform lists 
- wordform frequency: extracted and added during compilation, can use full-form lists 
- compounding: tags as comments in the LexC format 
- style: tags as comments in the LexC format
Pseudocode:
- closed POS: create a transducer containing all and only the rest, and xfst: print; - NAVAdv: For each word: - read one line from the lexicon files, including Comp and Style comments 
- generate full paradigm, and all compounding forms 
- filter the resulting word form list against any Comp and Style restrictions 
- add the Comp and Style restrictions to the relative wordforms (all for Style, 5 for Comp)
- output in the desired format
 
- read one line from the lexicon files, including Comp and Style comments 
Implementation points to consider: 
- It should be easy to add new output formats 
- the transducer(s) used for the conversion should be wrapped into a server
- the same server setup could be used for the CGI-BIN scripts
WHO??? Candidates: Saara, Tomi
The following output was generated to try out different strategies for 
hum-tf4-ans157:~ trond$ lookup -flags mbTT -utf8 gt/sme/bin/isme.fst 0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100% eadni+N+SgNomCmp#giella+N+Sg+Nom eadni+N+SgNomCmp#giella+N+Sg+Nom eadnegiella eadni+N+SgNomCmp#giella+N+Sg+Nom eadnegiella eadni+N+SgNomCmp#giella+N+Sg+Nom eadnegiella eadni+N+SgNomCmp#giella+N+Sg+Nom eadnegiella eadni+N+SgNomCmp#giella+N+Sg+Nom eadnegiella eadni+N+SgNomCmp#giella+N+Sg+Nom eadnegiella eadni+N+SgNomCmp#giella+N+Sg+Nom eadne-giella eadni+N+SgNomCmp#giella+N+Sg+Nom eadne-giella eadni+N+SgNomCmp eadni+N+SgNomCmp eadni+N+SgNomCmp +? Trond's version: sealgi+N+SgCmp#eadni+N+Sg+Nom sealgi+N+SgCmp#eadni+N+Sg+Nom sealeadni sealgi+N+SgCmp#eadni+N+Sg+Nom seal-eadni sealgi+N+SgCmp#eadni+N+Sg+Nom sealgeadni sealgi+N+SgCmp#eadni+N+Sg+Nom sealg-eadni sealgi+N+SgCmp#eadni+N+Sg+Nom sealggeadni sealgi+N+SgCmp#eadni+N+Sg+Nom sealgg-eadni sealgi+N+SgNomCmp#eadni+N+Sg+Nom sealgi+N+SgNomCmp#eadni+N+Sg+Nom sealgeeadni sealgi+N+SgNomCmp#eadni+N+Sg+Nom sealge-eadni sealgi+N+SgNomCmp#eadni+N+Sg+Nom sealgieadni <==== ? sealgi+N+SgNomCmp#eadni+N+Sg+Nom sealgi-eadni <==== ? sealgi+N+SgGenCmp#eadni+N+Sg+Nom sealgi+N+SgGenCmp#eadni+N+Sg+Nom sealggeeadni sealgi+N+SgGenCmp#eadni+N+Sg+Nom sealgge-eadni sealgi+N+SgGenCmp#eadni+N+Sg+Nom sealggieadni sealgi+N+SgGenCmp#eadni+N+Sg+Nom sealggi-eadni sealgi+N+PlGenCmp#eadni+N+Sg+Nom sealgi+N+PlGenCmp#eadni+N+Sg+Nom selggiideadni sealgi+N+PlGenCmp#eadni+N+Sg+Nom selggiid-eadni dušši+N+SgNomCmp#eadni+N+Sg+Nom dušši+N+SgNomCmp#eadni+N+Sg+Nom duššeadni dušši+N+SgNomCmp#eadni+N+Sg+Nom dušši-eadni dušši+N+SgNomCmp#eadni+N+Sg+Nom duššieadni

