Sámi dialogue based language learning system: Design document


Architecture Overview

The software consists of a Dialogue manager, a Sentence parser and a Speller together with a suggestion mechanism. In addition, user Feedback and a Tutorial section are integrated to the dialogue. The software is used via a web interface.

Component Design

The heart of the system is the dialogue manager. It conducts the dialogue by selecting the next utterance made by the machine. It reacts to the user's replies and starts feedback and tutorial dialogues when needed. The Dialogue manager calls the sentence parser and speller to parse the user replies.

Sentence parser

The system is able to parse sentences, also incorrect ones. At the moment, there is an analyzator and disambiguator and we are able to get a full syntactic analysis of the sentences that belong to the language.

The parser process involves three steps: check against a predefined set of correct answers, speller and grammar check. First of all, each question has a predefined set of answers that can be generated beforehand. If the answer does not belong to the set of predefined answers, the sentence is sent to the speller. If the words in the utterance are not recognized, the problematic words receive suggestions. When all the words are recognized, the predefined set of answers is checked again. If that check fails, the utterance is sent to the analyzer. In this final step, the sentence undergoes either overall grammar check or there is a more fine-grained check where the type of question is taken into account. We have to decide which one. The two classes of errors, spelling errors and grammatical errors are described in more detail in separate sections.

Spelling errors

Spelling errors are recognized by the speller. The Speller suggests an alternative, or a list of alternatives, from where the user selects the correct one. The Suggestion mechanism is restricted.

Grammatical errors

Grammatical errors are words in wrong forms. The grammatical analysis should be such that the words receive an analysis although they belong to an ungrammatical sentence.

There are two classes of grammatical errors: question-specific errors and common errors. Question-specific errors are the type where the user is expected to use some pronoun (TT: ???). For example, if the question is "Jugatgo gáfe?", then the user should use first person singular in the answer, both in the pronoun and the verb form. Does the question always determine the grammatical form of the answer, or do we need a general class of grammatical errors?

Each question is paired with a list of correct answers that involve the same verb as in the question (see the dialogue section below). However, if we want to allow the user to use also other verbs than what is provided in the question, the correct grammatical information that is expected has to be specified somehow. I don't know how this should be done. Usage of syntactic information etc.

Common errors could be detected in sentences that do not belong to the predefined set of answers for the question. I.e. basic grammar checking. Each grammatical error has a type. For example, if a sg1 subject is paired with a verb in the wrong form, we may ask the user to change the verb form. The errors are classified and the actions they cause are stored in an xml-structure which is described below, in the tutorial section.

Feedback and tutorial

The system gives feedback about spelling errors and grammatical errors. See the section speller for spelling errors. Whenever a grammatical error is detected, a feedback window and the tutorial section are activated. The feedback may be more fine-grained than the tutorial. One suggestion for the format is described below. You may decide the classes of feedback as you like, this example just illustrates the xml-structure. Note that the individual feedback may be more fine-grained than the tutorial.


<error type="MAINV_SG1">
  <fb>Select a verb in first person singular.</fb>
  <tut type="verb_inflection"/>
<error type="MAINV_PL1">
  <fb>The pronoun "we" requires first person plural for the verb form.</fb>
  <tut link="verb_inflection"/>

<error type="incomplete_answer">
  <fb>Answer with complete clauses.</fb>
  <tut link="complete_clauses"/>

The tutorial section consists of longer sections of text that describe certain grammatical phenomena. They are attached only to grammatical errors (since we don't have any mechanism to detect the type of spelling error). Each type of grammatical error has its own tutorial section, which can be opened when an error is encountered.


<tut name="verb_inflection">
..text and explanations with examples and formatting..

Dialogue context

The dialogue system in this program is machine-governed, which means that the machine asks questions, the user answers and the machine commentes upon the answers before it carries on. The conversation may also involve questions posed by the user, but the type of questions is defined by the machine. In the first version, we will have only questions asked by the machine. The dialogue system consists of two parts: dialogue context and dialogue manager.

When the program starts, the user first selects the conversation s/he wants to try. In addition, s/he may select some grammatical properties s/he wants to practice. Since the language learning system concentrates on learning the verb inflection, the grammatical properties would be Tense and Person-Number. Perhaps others as well. When the grammatical properties are selected (e.g. Present and Sg1), the questions are formatted so that at least some of them expect answer in the form selected by the user (Present and Sg1).

The context can be selected from a list of dialogues. In the later versions, there may be dialogues based on a picture, etc. The dialogues may be planned for different common situations that involve interaction: shopping, cafe etc., or by taking into account the user's interests and age for example. All the utterances made by the program are prepared beforehand.

The dialogue context consists of a list of topics that may be included in the conversation. When the user selects some conversation, the topics are selected and the conversation is generated. The list of questions may split to branches if the conversation contains yes/no -questions that are used as selection criteria for the next questions. When all the questions in the branch are used, a new topic is selected.

Dialogue manager

The dialogue manager selects the next utterance based on the dialogue context and user input. In the simplest dialogues, there is only one route through the dialogue. The program asks a question from the user and the user replies. To create a feeling of a conversation, the next question is based on the user input whenever possible.

Selecting the next question

There are two types of questions: yes/no questions and wh-questions. In addition, there are questions that have a fixed answer, such as "How are you?". The yes/no questions create branches to the conversation, where the next utterance by the machine depends on the user answer. The selection must be done reliably, therefore each question is stored together with a list of probable affirmative answers and a list of probable negative answers.

For example, the question "Jugatgo gáfe?" is paired with two lists of possible answers:

  affirmative: Jugan, Mun jugan, De jugan, Juo, mun jugan
  negative: In, in, In juga, In juga gáfe, In juga maidege

If the user input is one of these, then the next question can be selected reliably on the basis of the answer. Otherwise, the next question is selected following the discourse context. We could think of some heuristics here, for example if the answer starts with "Juo" or the main verb is in sg1, then we conclude that the sentence is affirmative even though it was not listed in the set of answers, and similarly with the negation verb. However, then there is a chance to do false predictions, for example if the user enters "Yes, but.." In the first prototype, the selection will be based on a preformatted list of answers.

We also want to be able to change the format of the question so that instead of asking in Sg2, the question could be in Pl2 or Du2 for example. This means that the example above is too simplistic. The question should be paired with the morphological information. The conversation may be selected so that the same form is used in most of the questions.

  Juotteko kahvia?
  aff: Juomme, Kyllä juomme, kyllä, juomme kahvia, kyllä, me juomme kahvia.
  neg: Emme juo, Emme juo kahvia

Therefore, the question could be formed using the base form or, alternatively, the morphological information alone. The form may be given either straight in the xml-structure or it can be generated. There may also be alternative questions for one question type to get variability to the conversation. Since there may be variability at least in the Person-Number and perhaps also otherwise, we have to find some general format for the questions. Let's start with a simple yes/no question.

Specifying a general form for a yes/no -question and its answers

The idea is that the question may be represented using very general terms. in this way, we may vary the grammatical properties of the sentences, for example Tense and Person-Number. When the user selects the Tense and Person-Number s/he likes, the questions are generated accordingly. Some questions may be more restricted, but that's fine. I would like to include the possibility to use both a very general format for the question and more specific formats where the utterances and questions made by the machine are explicitely stated. We will use the same names for tags and tagsets that are defined in korpustags.txt.

<question name="jugatgo_gáfe">
  <q Tense="Prs" PN="Sg2">Jugatgo gáfe?</q>
  <q Tense="Prs" PN="Sg2">Jugatgo gáfe?</q>

<question name="jugatgo_gáfe">
  <q><grammar V="juhkat" N="gáffe">V+MV_GRAM+Qst N+Sg+Acc</grammar></q>

<grammar type="MV_GRAM">

Or, it is possible to use even more general types of questions. For example (these are only examples, you may define the tag classes as you like):

<question name="jugatgo_gáfe">
  <q type="go_objsg"><grammar V="juhkat" N="gáffe"/></q>

<proto_question type="go_objsg">
  <q>V+MV_GRAM+Qst N+SG+ACC</q>

This formulation allows the properties of the verb to be parametrized and selected by the program (or the user when s/he selects the grammatical properties s/he wants to practice).

When the question is in a general format, we would like the answers have the similar format. In this case, the verb and object are recieved from the program so it doesn't have to be specified again. Nevertheless, if the question can be answered using different verbs, then a list of verbs can be provided.

<answer name="jugatgo_gáfe">
    <ans>Pron V+MV_GRAM</ans>
    <ans>De V+MV_GRAM</ans>
    <ans>Juo, Pron V+MV_GRAM</ans>
    <ans>Neg+NEG_GRAM V+CONNEG</ans>
    <ans>Neg+NEG_GRAM V+CONNEG N+SG+ACC</ans>
    <ans>Neg+NEG_GRAM V+CONNEG maidege</ans>

Or, we may abstract words like "De" and "Juo" away and provide them by the program, so that an answer to a yes/no question may start with "De" or "Juo" and only what follows is parsed. It is also possible to mark optionality straight in the input:

<proto_answer type="go_objsg">
    <ans>(Pron) V+MV_GRAM (N+SG+ACC)</ans>
    <ans>Neg+NEG_GRAM V+CONNEG (N+SG+ACC)</ans>

The idea is that these protoquestions and protoanswers always may be overridden by more specific questions and answers. For example it is possible to write down the question and answers in the exact forms without any variation. This more specific format can be used by specifying the name of it in the dialogue.

The dialogue structure

Each conversation is stored in an xml-document that has e.g. the format below. The conversation consists of topics that are listed in the order they are assumed to appear if the order is not stated in the topic definition, see below.

<dialogue name="cafe">
   <topic name="juhkat"/>
   <topic name="syöminen"/>
   <topic name="maksaminen"/>

The structure for the topic is given below. Each topic contains an unordered list of questions. Each yes/no questions may contain three elements that contain the instructions for different actions: affirmative, negative and default action. The action may be an utterance or a link to the next question. The default action is used if none of the other actions are not activated. If there is no action at all, the next topic is selected. Regardless of how the conversation proceeds, there may be opening or ending statements for both the conversation and the topic.

<topic name="juhkat">
   <opening>Mitä joisimme?</opening>
   <question name="jugatgo_gáfe">
       <q link="sokeri"/>
       <q link="maito"/>
       <q link="jugatgo_deaja"/>
       <q link="jugatgo_maidege"/>
       <q link=".."/>

   <question name="sokeri">
     <q>Otatko sokeria?</q>
       <utt>Ole hyvä!</utt>
      <q link="maito"/>

<topic name="syöminen">

<topic name="maksaminen..">

This xml-structure does not contain the selection criteria for affirmative and negative answers but they are in a separate document. In this way, the questions can also be reused in different conversations.

In the case of wh-questions, the next question is selected following the discourse context. The topic of the conversation can be changed partly randomly. However, in some cases the user input could conduct the conversation e.g. if the user mentions a topic that is listed in available topics. For example "Maid don jugat?" and we have prepared a conversation regarding that topic (tea, coffee). In the first versions, we will not have this functionality.

In addition, there is a list of answers that may cause different kinds of reactions. For example, incomplete answers cause feedback in all the conversations:

<error link="incomplete_answer">

Some answers may cause tutorial window to open etc. The structure of feedback and tutorial is described in the section below.


The Speller for North and Lule Sámi will be used in correcting spelling errors by the user. For an addition mechanism, see the section on Levenshtein below.

Suggestion mechanism

The suggestion mechanism can be restricted by having conversation-based lexicons, which contain thematically restricted vocabularies. In the other hand, the whole project needs a lexicon of common words that may occur in each dialogue (containing pronouns, auxiliaries, common verbs,..). It may even turn out that one restricted lexicon is suitable for all the discussions.

There are several problems with the correction mechanisms: If the user tries to write the correct grammatical form of a verb, but has a spelling error, the speller may suggest a grammatically wrong form. another problem is that there will be very many correct suggestions for each spelling error


As a supplement to the ordinary speller, we may consider adding a separate speller reacting to specific answers. So, when asked for a drink, the user will answer with some word, and the question-specific speller can check the sentence up against a set of drinks, instead of against the whole lexicon. See a separate document on this.

Web interface

The system is accessed via a web interface. When the user selects the conversation, the interface is generated, and the topics are selected and put in some order. Thus the conversation is ready when the user starts it. The yes/no questions are taken into account when the conversation is generated. The wh-questions may cause topic changes that cannot be taken into account in beforehand and require the conversation to be regenerated, and in addition, there has to be a logging mechanism of already used topics. The topic changes by the user are not included in the first version of the system.

Spelling errors cause a small window to open which briefly states the error and lists suggestions.

Grammatical errors cause a similar window to open, with brief description of the error and some assistance. In addition, the tutorial section in the side bar is activated and the correct tutorial page is opened.

A risk for the program and especially for the usage of the interface is the speed. To avoid waiting time in cases of correct sentences, the dialogue is generated in beforehand as much as possible. When the user's answer is not found from the predefined set of answers, a speller call is required, thereafter there may be an analyzer call, and all of these take time. Therefore, attention has to be paid to the speech already at the start of the development.

Morphology drill

See separate document on the morphology drill

Quality assurance