Konteaksta Technical Documentation

Technical documentation

The core of Konteaksta is implemented as a Java servlet on Apache Tomcat web server.

Updating the code on gtoahpa-01

  • sudo su teaksta
  • cd
  • svn up
  • ./make_teaksta.sh

Sometimes the restart of the Tomcat web server is needed as well:

  • cd $CATALINA_HOME/bin
  • ./shutdown.sh
  • ./startup.sh

The logs of running Konteaksta can be found in the directory $CATALINA_HOME/logs/.

Pipeline

The pipeline is:

cat text | \
/opt/smi/sme/bin/preprocess --abbr=/opt/smi/sme/bin/abbr.txt | \
/usr/bin/lookup -flags mbTT /opt/smi/sme/bin/analyser-disamb-gt-desc.xfst | \
/opt/smi/sme/bin/lookup2cg | \
/usr/local/bin/vislcg3 -g /opt/smi/sme/bin/disambiguator.cg3 | \
/usr/local/bin/vislcg3 -g /opt/smi/sme/bin/konteaksta.cg3

As of Mar-2018 tests have been performed using xfst vs hfst in the pipeline (see FSTs tests).

File structure

The source code of North Sámi Konteaksta is in $GTHOME/apps/teaksta/sme/src/main.

Exercise topics are defined in src/main/webapp/activities/. Each topic has its folder.

  • Adverbial/
  • InfiniteVerbs/
  • Object/
  • SubstantivePlural/
  • VerbConjugation/
  • Conjunctions/
  • NegVerbs/
  • Subject/
  • SubstantiveSingular/

For each folder, there are three files:

  • activity.xml
  • help.jsp
  • recommended_pages.html (optional)

activity.xml defines preprossessing and post-processing pipelines (those are defined in separate xml-files that can be found under the desc/ -folder) and tag sequences that identify the words that are relevant for the exercise.

help.jsp gives the help text provided to the user (in the old WERTi interface).

recommended_pages.html lists pages that teachers recommend to use as a basis of exercises on this topic. When adding new URL-s to these files please keep to the same structure of the <a> elements.

Instructions of how to add a new exercise topic are here.

How the program works

Konteaksta front page is src/main/webapp/index.html. It is written in HTML and Javascript, and uses Bootstrap framework (v4.0.0) and Font Awesome icons (v5.0.8). When pressing the button "Go!" the Java servlet will be run that does the following:

  1. preprocessing:
    1. extracting the textual content from the webpage
    2. tokenisation
    3. sentence boundary detection
    4. linguistic annotation (src/main/java/werti/uima/ae/Vislcg3Annotator.java)
      1. morphological analysis (FST)
      2. morphological disambiguation (CG)
      3. shallow syntactic parsing (CG)
  2. postprocessing (one of the Topic Enhancer.java files in src/main/java/werti/uima/enhancer/):
    1. Selection of the tokens that are relevant for the topic
    2. Enhancement - enriching the HTML code with additional attributes. The relevant tokens will be marked and provided with attributes as lemma, distractors and possibleforms.
  3. Loads the enhanced page to the browser. The four different exercise types (color, click, multiple choice, cloze) are implemented in Javascript.

Since the process can be quite slow, the preprocessing step has been run once for each text in the recommended_pages list and the output has been saved in .xmi file, so that only the postprocessing step is executed each time (the process is now twice as fast). To analyze again the texts (new FST, or other reason) on gtoahpa-01:

Example of enhanced HTML code

<span id="WERTi-span- Ind Prt-2"
      class="wertiviewtoken  wertiviewVerbConjugation"
      lemma="leat"
      distractors="leat lean leat leahkit leame leamen leahkime lea "
      possibleforms="lei leai ">  lei
</span>

If the enhancement does not work as expected we usually can find out why by looking at the page's source.

How to create a new topic Xxx

  1. in src/main/webapp/index.html, add new topic as new element in fáttát_list
  2. in src/main/resources/firefox-extension/werti/chrome/content/wertiview.css, add style for new class named .colorizeStyleXxx
  3. in src/main/webapp/activities
    1. create folder named Xxx
    2. copy files from a similar activity folder
    3. edit files as necessary
  4. in src/main/java/werti/util/HTMLEnhancer.java, add new topic Xxx in dict
  5. in src/main/java/werti/uima/enhancer/, create file named Vislcg3XxxEnhancer.java (best start is copying content from similar topic and edit as necessary)
  6. in src/main/resources/firefox-extension/werti/chrome/content/, create file named xxx.js (best start as above)
  7. in desc/enhancers, create file named vislcg3XxxEnhancer.xml (best start as above)
  8. in desc/operators, create file named vislcg3PostProc_Xxx.xml.xml (best start as above)
  9. in pom.xml, add xxx.js in <filesToInclude>