Using The Ipa Generating Pipeline
- HFST (at least svn r2160)
- VISLCG3 (a recent svn version)
- Apertium (one tool only)
The pipeline is not yet fully functional. This document is both a guide to help us get where we want, and documentation for the present status and planned functionality.
Here is a test command illustrating the whole processing pipeline from plain text in until IPA out (not all components are in place yet, and those components are substituted with alternatives to get something running):
$ echo "Iđđes dii. 9 mun doapmalan čoaggit alitnásttiid álbmotmeahcis." | \ apertium-destxt | \ hfst-proc -C -w -e -q -r sme/bin/sme.hfstol | \ vislcg3 -g sme/src/sme-dis.rle | \ grep -v '^"' | cut -d '"' -f3 | cut -d ' ' -f2 | \ hfst-optimized-lookup -q sme/bin/isme.hfstol | \ cut -f2 | grep -v '^$'
The output produced with the above pipeline is:
Iđđes+Adv+? dii. 9 +? doapmalan čoaggit alitnásttiid álbmotmeahcin álbmotmeahcis .. +?
The target is to produce IPA, one output token for each input token.
The text output option illustrated above can be used to ensure 1: 1 roundtrip correctnes for the disambiguation and generation - we should be able to produce the same output as we put into the pipeline.
Below is each command commented:
echo "Iđđes dii. 9 mun doapmalan čoaggit alitnásttiid álbmotmeahcis."
hfst-proc -C -w -e -q -r sme/bin/sme.hfstol
vislcg3 -g sme/src/sme-dis.rle
grep + cut + cut
hfst-optimized-lookup -q sme/bin/isme.hfstol
cut + grep
- replace the analysing transducer with a tailored speech synthesis transducer; the most important diff against the regular transducer is that most (all?) tags are included, to ensure round-trip stability
- tune the disambiguation
- replace the generating transducer with a real IPA transducer