levenshtein

Goal:Use the Levenshtein algorithm to correct answers

Rationale

We often know the likely answer. This means we might initiate a spelling process with a very restricted lexicon (the set of candidates).

So, in asking what the user wants to drink, we might compare the answer (or only the unrecognised words) up against our relevant sets (here: our set of drinks), and suggest a correction.

Example

Example of Levenshtein algorithm, taken from http://us.php.net/levenshtein:

The algorithm will give the following:

  • Input word: carrrot
  • Did you mean: carrot?
int levenshtein  ( string $str1  , string $str2  [, int $cost_ins  ], int $cost_rep  , int $cost_del  )


<?php
// input misspelled word
$input = 'carrrot';

// array of words to check against
$words  = array('apple','pineapple','banana','orange',
                'radish','carrot','pea','bean','potato');

// no shortest distance found, yet
$shortest = -1;

// loop through words to find the closest
foreach ($words as $word) {

    // calculate the distance between the input word,
    // and the current word
    $lev = levenshtein($input, $word);

    // check for an exact match
    if ($lev == 0) {

        // closest word is this one (exact match)
        $closest = $word;
        $shortest = 0;

        // break out of the loop; we've found an exact match
        break;
    }

    // if this distance is less than the next found shortest
    // distance, OR if a next shortest word has not yet been found
    if ($lev <= $shortest || $shortest < 0) {
        // set the closest match, and shortest distance
        $closest  = $word;
        $shortest = $lev;
    }
}

echo "Input word: $input\n";
if ($shortest == 0) {
    echo "Exact match found: $closest\n";
} else {
    echo "Did you mean: $closest?\n";
}

?>