How To Control Compounding In Spellers
Introduction
Speller development requires a lot of fine-tuning to become good. This is¨
In theory, most compounding languages allow so-called free compounding. In
The basic idea is this: use tags in the lexicon to describe what kind of
There are two types of restrictions: position and form. For some languages
Position restrictions
The present set of supported tags and their definition (i.e. positions) is:
- +CmpN/First
- can be first part only, or used standalone
- +CmpN/Pref
- can be prefix only, never alone
- +CmpN/Last
- can be last part only, or used standalone
- +CmpN/Suff
- can be suffix only, never alone
- +CmpN/None
- can not take part in compounds
- +CmpN/Only
- can be part of a compound in all positions, but not used alone
There is another logical possibility, namely being allowed in the middle and
How to encode
There are a couple of steps to take. They are:
- add multichars to root.lexc
- add some flag diacritics to certain lexicons
- add tags to lexical entries needing restrictions
Multichar symbols required
There are two types:
- the +CmpN/XXX tags listed above
- flag diacritics multichars
The flag diacritics are already added to most languages and to the und
Multichar tags:
+CmpN/First !!≈ * @CODE@ - ... can only be first part in a compound or alone +CmpN/Pref !!≈ * @CODE@ - ... only __first__ part in a compound, NEVER alone +CmpN/Last !!≈ * @CODE@ - ... can only be last part in a compound or alone +CmpN/Suff !!≈ * @CODE@ - ... only __last__ part in a compound, NEVER alone +CmpN/None !!≈ * @CODE@ - ... can not take part in compounds +CmpN/Only !!≈ * @CODE@ - ... can only be part of a compound, i.e. can never !! be used alone, but can appear in any position
The flag diacritic symbols that go along with the tags above:
@P.CmpFrst.FALSE@ !!≈ | @CODE@ | Require that words tagged as such only appear first @D.CmpPref.TRUE@ !!≈ | @CODE@ | Block such words from entering ENDLEX @P.CmpPref.FALSE@ !!≈ | @CODE@ | Block these words from making further compounds @D.CmpLast.TRUE@ !!≈ | @CODE@ | Block such words from entering R @D.CmpSuff.TRUE@ !!≈ | @CODE@ | Block such words from entering R @P.CmpSuff.TRUE@ !!≈ | @CODE@ | Mark that we have passed R @D.CmpNone.TRUE@ !!≈ | @CODE@ | Combines with the next tag to prohibit compounding @U.CmpNone.FALSE@ !!≈ | @CODE@ | Combines with the prev tag to prohibit compounding @P.CmpOnly.TRUE@ !!≈ | @CODE@ | Sets a flag to indicate that the word has passed R @D.CmpOnly.FALSE@ !!≈ | @CODE@ | Disallow words coming directly from root.
In both cases the code can just be copied and pasted directly in the
Multichar symbols required in lexicons
There are two types of lexicons requiring flag diacritics that go along with the
The details of the R lexicon(s) vary from language to language, but to make
@P.CmpFrst.FALSE@@D.CmpLast.TRUE@@D.CmpNone.TRUE@@U.CmpNone.FALSE@@D.CmpHyph.TRUE@@U.CmpHyph.FALSE@@P.CmpOnly.TRUE@@P.CmpPref.FALSE@@D.CmpSuff.TRUE@@P.CmpSuff.TRUE@
NB! It is important that this is all on one line, with no spaces between
The ENDLEX lexicon has a much shorter list of required flag diacritics:
LEXICON EndLex @D.CmpOnly.FALSE@@D.CmpPref.TRUE@ # ;
(There might be other needs and requirements on this lexicon, what is listed
Example code of lexical entries
Here is some example code from North Sámi (sme):
agibeailáibi+CmpN/First:agi#beai#láj'bi GOAHTI-I "life-long nourishment N" ; agibeaimuitu+CmpN/First:agi#beai#muj'tu AIGI "eternal memory N" ; ađa+CmpN/Last:ađđam8 SEAMU "marrow N" ;
With this LexC code, the two first words above will only be allowed to be
How it works
The tags on each lexical entry are converted to flag diacritics.
Now that the lexical entries have flag diacritics, they will only be allowed to
Those tags are somewhat shorter, and much easier to read and maintain. They can
Form restrictions
To be written.