corpus_policy
MEETING TO DECIDE A USER ACCESS POLICY FOR LINGUSTIC ANALYSIS OR OUR CORPORA
Agenda
- target user groups
- access models
- policy for each user group, if different
target user groups
- Users that don't need a unix shell
- linguists doing research on singleton examples
- historians and other people interested in content, not in form
- linguists doing research on singleton examples
- Users that do need a unix shell
- linguists doing research on texts as a whole
- linguists with separate analysis tools
- language technology developers
- linguists doing research on texts as a whole
We have both free texts and texts which are restricted. The restricted texts must be protected by means of usernames and passwords, and require a contract.
Users that don't need unix shell
The Oslo interface is good enough for this user category, or it will require only small modifications, e.g.. links to documents containing the hits, preferably with the hits highlighted.
Users that need a unix shell
Typically, these users are linguists or language technology developers coming with their own tools, e.g. another disambiguator, a separate morphological analyser, or in general will need command line access to the whole corpus to achieve what they want. Also other scholars may belong to this group. These users will need access to our corpus machine(s), and will invariably be required to accept our user contract.
Shell access
There will be two groups with access to /usr/local/share/corp, with the following access rights:
Group | Description | Intended users |
---|---|---|
bound | Access to read the bound corpus | External linguists |
corpus | Access to alter our orig. catalogues | Project workers (group as today) |
External users will get their own user account, belonging to the groups myself and bound, and will be able to install their own tools and programs for corpus processing, analysis, etc. External users will not get access to the orig/ directory.
To let the bound group members be able to analyse, we need to do some minor adjustments - as other they automatically have full access to the Xerox tools, and the compiled fst's are available in /opt/smi/sme/bin/sme-num.fst etc. The Xerox tools and vislcg are available in /opt/Xerox/bin. A couple of tools are missing right now, and need to be added to /opt/ by a crontab.
TODO:
- make a group bound for our external corpus users, which:
- gives access to read our bound texts
- gives access to execute/run the tools in /opt
- gives access to read our bound texts
- export to /opt (with cron) tools that the project team members do find in
- ccat (and some perl scripts?)
- other tools?
- ccat (and some perl scripts?)
- make shell script wrappers for the most common commands
- write documentation for our bound users, with pointers to the ordinary
- write user contract
- write documentation for how to apply for a user account (where's the form, to
- make our own guidelines for the user application processing
Web browser access
Users of the bound corpus will need a username and password to the Oslo computer
TODO:
- discuss with Oslo
- delay other tasks until we are ready to go public?
- user management for access to bound texts
09 for each user group
Future policy for non-shell users
- The free texts will be available without a password, and will require no
- The bound texts will be available, graphically, with a password, and bound by
- All interested parties may download our cvs tree, and our open texts (the
- No one may download our bound texts
Future policy for shell users
- All texts will be available only with a username/password, and bound by our contract.
- shell access is provided for gtlab and other linux boxes, and possibly our XServe.
- They will have read-only access to the corpus files, and access to our tools
Future splitting of the cvs group
Today, the cvs group has access to alter and read our linguistic source code. In the future, we may split this access into alter OR read, and make it more fine-grained, according to subtree (gt, kt, st, xtdoc), or even according to language.