OpenSource Speller Technical Documentation

There is a family of open-source speller engines available, namely Aspell, hunspell, iSpell and Myspell (at least). They are inspired by each other, and partly borrowing code from each other.

Because of these relationships, the expressive power in their linguistic formalism seems to be more or less identical, and other projects generate speller dictionaries for all three engines from the same source code, most notably Debian. We therefore document these spellers and our implementation for them under a common umbrella, loosely tagged X-spell (replace X- with your favourite speller).

Present work

Initially we have worked with Aspell. Documentation and support will be expanded to cover the other ones as well.

Links

Aspell

hunspell

It seems hunspell is our friend wrt open-source spellers: real compounding, advanced continuation/inflection lexicons (they can be combined, as opposed to Aspell/MySpell/Ispell), forbidden words, and several other nice features. It is still a far cry from Xerox and similar technologies, but it is a big step forward from the Ispell tradition. Aside from two-level rules to handle complex morphophonology, the biggest problem is integration with host applications. Hunspell does however supports the most important one: OpenOffice.org. One nice feature of it is that it can also be used as a morphological analyzer.

The code is partially based on MySpell, and the dictionary and affix file formats are almost identical to the MySpell formats, with exceptions for extensions to support compounding, circumflexing, a larger continuation class set, etc.

Ispell

Ispell is the father of all interactive spellers in the Unix world. Ispell might be too old, with restricted or lacking support for Unicode (UTF-8), and with more restrictions on the number of continuation lexicons than Aspell/MySpell. It is also targeted at a different set of applications than MySpell and Aspell, mainly command line ones. We might thus decide to skip Ispell support, and instead look for new initiatives like hunspell.

MySpell

by Sjur N. Moshagen, Tomi Pieski, Trond Trosterud