Konteaksta Technical Documentation

Contents:

Technical documentation
Pipeline
File structure
How the program works
Example of enhanced HTML code
How to create a new topic Xxx

Technical documentation

The main Konteaksta background document
The test url: http://oahpa.no/konteaksta

The core of Konteaksta is implemented as a Java servlet on Apache Tomcat web server.

Updating the code on gtoahpa-01

sudo su teaksta
cd
svn up
./make_teaksta.sh

Sometimes the restart of the Tomcat web server is needed as well:

cd $CATALINA_HOME/bin
./shutdown.sh
./startup.sh

The logs of running Konteaksta can be found in the directory $CATALINA_HOME/logs/.

Pipeline

The pipeline is:

cat text | \
/opt/smi/sme/bin/preprocess --abbr=/opt/smi/sme/bin/abbr.txt | \
/usr/bin/lookup -flags mbTT /opt/smi/sme/bin/analyser-disamb-gt-desc.xfst | \
/opt/smi/sme/bin/lookup2cg | \
/usr/local/bin/vislcg3 -g /opt/smi/sme/bin/disambiguator.cg3 | \
/usr/local/bin/vislcg3 -g /opt/smi/sme/bin/konteaksta.cg3

As of Mar-2018 tests have been performed using xfst vs hfst in the pipeline (see FSTs tests).

File structure

The source code of North Sámi Konteaksta is in $GTHOME/apps/teaksta/sme/src/main.

Exercise topics are defined in src/main/webapp/activities/. Each topic has its folder.

Adverbial/
InfiniteVerbs/
Object/
SubstantivePlural/
VerbConjugation/
Conjunctions/
NegVerbs/
Subject/
SubstantiveSingular/

For each folder, there are three files:

activity.xml
help.jsp
recommended_pages.html (optional)

activity.xml defines preprossessing and post-processing pipelines (those are defined in separate xml-files that can be found under the desc/ -folder) and tag sequences that identify the words that are relevant for the exercise.

help.jsp gives the help text provided to the user (in the old WERTi interface).

recommended_pages.html lists pages that teachers recommend to use as a basis of exercises on this topic. When adding new URL-s to these files please keep to the same structure of the <a> elements.

Instructions of how to add a new exercise topic are here.

How the program works

Konteaksta front page is src/main/webapp/index.html. It is written in HTML and Javascript, and uses Bootstrap framework (v4.0.0) and Font Awesome icons (v5.0.8). When pressing the button "Go!" the Java servlet will be run that does the following:

preprocessing:
1. extracting the textual content from the webpage
2. tokenisation
3. sentence boundary detection
4. linguistic annotation (src/main/java/werti/uima/ae/Vislcg3Annotator.java)
  1. morphological analysis (FST)
  2. morphological disambiguation (CG)
  3. shallow syntactic parsing (CG)
postprocessing (one of the Topic Enhancer.java files in src/main/java/werti/uima/enhancer/):
1. Selection of the tokens that are relevant for the topic
2. Enhancement - enriching the HTML code with additional attributes. The relevant tokens will be marked and provided with attributes as lemma, distractors and possibleforms.
Loads the enhanced page to the browser. The four different exercise types (color, click, multiple choice, cloze) are implemented in Javascript.

Since the process can be quite slow, the preprocessing step has been run once for each text in the recommended_pages list and the output has been saved in .xmi file, so that only the postprocessing step is executed each time (the process is now twice as fast). To analyze again the texts (new FST, or other reason) on gtoahpa-01:

sudo su teaksta
cd ~/analyzedTexts/
rm cas_*
go to http://gtoahpa-01.uit.no/konteaksta and produce one exercise (it doesn't matter which) with each text

Example of enhanced HTML code

<span id="WERTi-span- Ind Prt-2"
      class="wertiviewtoken  wertiviewVerbConjugation"
      lemma="leat"
      distractors="leat lean leat leahkit leame leamen leahkime lea "
      possibleforms="lei leai ">  lei
</span>

If the enhancement does not work as expected we usually can find out why by looking at the page's source.

How to create a new topic Xxx

in src/main/webapp/index.html, add new topic as new element in fáttát_list
in src/main/resources/firefox-extension/werti/chrome/content/wertiview.css, add style for new class named .colorizeStyleXxx
in src/main/webapp/activities
1. create folder named Xxx
2. copy files from a similar activity folder
3. edit files as necessary
in src/main/java/werti/util/HTMLEnhancer.java, add new topic Xxx in dict
in src/main/java/werti/uima/enhancer/, create file named Vislcg3XxxEnhancer.java (best start is copying content from similar topic and edit as necessary)
in src/main/resources/firefox-extension/werti/chrome/content/, create file named xxx.js (best start as above)
in desc/enhancers, create file named vislcg3XxxEnhancer.xml (best start as above)
in desc/operators, create file named vislcg3PostProc_Xxx.xml.xml (best start as above)
in pom.xml, add xxx.js in <filesToInclude>

ICALL

Konteaksta

SMARTool

Gïelese