Konteaksta Technical Documentation
Technical documentation
The core of Konteaksta is implemented as a Java servlet on Apache Tomcat web server.
Updating the code on gtoahpa-01
- sudo su teaksta
- cd
- svn up
- ./make_teaksta.sh
Sometimes the restart of the Tomcat web server is needed as well:
- cd $CATALINA_HOME/bin
- ./shutdown.sh
- ./startup.sh
The logs of running Konteaksta can be found in the directory $CATALINA_HOME/logs/.
Pipeline
The pipeline is:
cat text | \ /opt/smi/sme/bin/preprocess --abbr=/opt/smi/sme/bin/abbr.txt | \ /usr/bin/lookup -flags mbTT /opt/smi/sme/bin/analyser-disamb-gt-desc.xfst | \ /opt/smi/sme/bin/lookup2cg | \ /usr/local/bin/vislcg3 -g /opt/smi/sme/bin/disambiguator.cg3 | \ /usr/local/bin/vislcg3 -g /opt/smi/sme/bin/konteaksta.cg3
As of Mar-2018 tests have been performed using xfst vs hfst in the pipeline (see FSTs tests).
File structure
The source code of North Sámi Konteaksta is in $GTHOME/apps/teaksta/sme/src/main.
Exercise topics are defined in src/main/webapp/activities/.
- Adverbial/
- InfiniteVerbs/
- Object/
- SubstantivePlural/
- VerbConjugation/
- Conjunctions/
- NegVerbs/
- Subject/
- SubstantiveSingular/
For each folder, there are three files:
- activity.xml
- help.jsp
- recommended_pages.html (optional)
activity.xml defines preprossessing and post-processing pipelines (those are defined in separate xml-files that can be found under the desc/ -folder) and tag sequences that identify the words that are relevant for the exercise.
help.jsp gives the help text provided to the user (in the old WERTi interface).
recommended_pages.html lists pages that teachers recommend to use as a basis of exercises on this topic. When adding new URL-s to these files please keep to the same structure of the <a> elements.
Instructions of how to add a new exercise topic are here.
How the program works
Konteaksta front page is src/main/webapp/index.html. It is written in HTML and Javascript, and uses Bootstrap framework (v4.0.0) and Font Awesome icons (v5.0.8). When pressing the button "Go!" the Java servlet will be run that does the following:
- preprocessing:
- extracting the textual content from the webpage
- tokenisation
- sentence boundary detection
- linguistic annotation (src/main/java/werti/uima/ae/Vislcg3Annotator.java)
- morphological analysis (FST)
- morphological disambiguation (CG)
- shallow syntactic parsing (CG)
- morphological analysis (FST)
- extracting the textual content from the webpage
- postprocessing (one of the Topic Enhancer.java files in src/main/java/werti/uima/enhancer/):
- Selection of the tokens that are relevant for the topic
- Enhancement - enriching the HTML code with additional attributes. The relevant tokens will be marked and provided with attributes as lemma, distractors and possibleforms.
- Selection of the tokens that are relevant for the topic
- Loads the enhanced page to the browser. The four different exercise types (color, click, multiple choice, cloze) are implemented in Javascript.
Since the process can be quite slow, the preprocessing step has been run once for each text in the recommended_pages list and the output has been saved in .xmi file, so that only the postprocessing step is executed each time (the process is now twice as fast).
- sudo su teaksta
- cd ~/analyzedTexts/
- rm cas_*
- go to http://gtoahpa-01.uit.no/konteaksta and produce one exercise (it doesn't matter which) with each text
Example of enhanced HTML code
<span id="WERTi-span- Ind Prt-2" class="wertiviewtoken wertiviewVerbConjugation" lemma="leat" distractors="leat lean leat leahkit leame leamen leahkime lea " possibleforms="lei leai "> lei </span>
If the enhancement does not work as expected we usually can find out why by looking at the page's source.
How to create a new topic Xxx
- in src/main/webapp/index.html, add new topic as new element in fáttát_list
- in src/main/resources/firefox-extension/werti/chrome/content/wertiview.css, add style for new class named .colorizeStyleXxx
- in src/main/webapp/activities
- create folder named Xxx
- copy files from a similar activity folder
- edit files as necessary
- create folder named Xxx
- in src/main/java/werti/util/HTMLEnhancer.java, add new topic Xxx in dict
- in src/main/java/werti/uima/enhancer/, create file named Vislcg3XxxEnhancer.java (best start is copying content from similar topic and edit as necessary)
- in src/main/resources/firefox-extension/werti/chrome/content/, create file named xxx.js (best start as above)
- in desc/enhancers, create file named vislcg3XxxEnhancer.xml (best start as above)
- in desc/operators, create file named vislcg3PostProc_Xxx.xml.xml (best start as above)
- in pom.xml, add xxx.js in <filesToInclude>