Meeting_2011-05-03
Contents:
- Corpus meeting 11.4.2011
- Agenda
-
Bugzilla
- 983 maj P2 xsl conv boerre@skolelinux.no Multiple copies of sma files in corpus
- 301 enh P5 Text cor borre.gaup@samediggi.no Proofed/unproofed parallel files need better treatment
- 841 maj P2 Text cor ciprian.gerstenberger@uit.no Nested error markup does not get correctly converted to xml
- 838 nor P3 xsl conv ciprian.gerstenberger@uit.no Log file creation not permitted by some users
- 836 cri P2 xsl conv ciprian.gerstenberger@uit.no Replace all xsl:include with xsl:import in corpus stylesh...
- 835 cri P3 xsl conv ciprian.gerstenberger@uit.no Corpus conversion xsl files missing from $CORPUSHOME/bin/
- 941 nor P3 xsl conv sjur.moshagen@samediggi.no Combination of $GTFREE/orig/sme/facta/659_1.pdf.xsl and $...
- 884 enh P5 Correct! sjur.moshagen@samediggi.no corpus.dtd seems flawed
- Status quo
- Forward
Corpus meeting 11.4.2011
Present: Berit Merete, Børre, Ciprian, Tomi, Trond
Agenda
- Bugzilla
- Status quo
- Forward
Bugzilla
983 maj P2 xsl conv boerre@skolelinux.no Multiple copies of sma files in corpus
The bug itself
20-or-so left (cf. list in bug). Børre to close it.
Extention
- A follow-up would be (is) to follow up for the other languages: in the whole corpus (free+bound).
- We also risk having double content (same file, two names)
- Detect: sort by line and look for double lines
- Detect: sort by line and look for double lines
- General problem: test of NOT adding already existing files
- Script for doing that?
The last point should await a somewhat working corpus.
301 enh P5 Text cor borre.gaup@samediggi.no Proofed/unproofed parallel files need better treatment
This is not a bug, it is a feature.
"We are receiving both proofread and unproofed documents from Min Áigi, and some
Suggestion: Make a web page / project for goldstandard improvement, and close this bug.
841 maj P2 Text cor ciprian.gerstenberger@uit.no Nested error markup does not get correctly converted to xml
This is the only real bug on this list. So, this is for Cip (and Sjur). Check with Lene for a reasonable deadline (mid august?).
838 nor P3 xsl conv ciprian.gerstenberger@uit.no Log file creation not permitted by some users
Cip closes.
836 cri P2 xsl conv ciprian.gerstenberger@uit.no Replace all xsl:include with xsl:import in corpus stylesh...
Old metafiles. These attributes (xsl: include inside the xsl file for each and every corpus file) does not apply anymore.
TODO: Børre to write an explanation in the bug, and Ciprian to close.
Have a look at line 151 in gt/script/langTools/Converter.pm
There is common.xsl and whatnot combined into a new .xsl file
835 cri P3 xsl conv ciprian.gerstenberger@uit.no Corpus conversion xsl files missing from $CORPUSHOME/bin/
This is Cip's cup of tea.
941 nor P3 xsl conv sjur.moshagen@samediggi.no Combination of $GTFREE/orig/sme/facta/659_1.pdf.xsl and $...
#14 var whole runtime error: file /home/boerre/gtsvn/gt/script/corpus/common.xsl line 777 element param xsltApplyXSLTTemplate: A potential infinite template recursion was detected. You can adjust xsltMaxDepth (--maxdepth) in order to raise the maximum number of nested template calls and variables/params (currently set to 3000). Templates:
This is an error with the $GTHOME/gt/script/corpus/common.xsl file, and B has no idea what error.
The following lines are not a result of this:
para_ref after type03 Reinbeitedistrikt som presses av selskap som žnsker å bygge ut vindmžlleparker står i en vanskelig situasjon. Et frivillig forlik med utbygger om en erstatning, kan vŽre å foretrekke fremfor en usikker kamp i rettssystemet. I distrikt 6 i Varanger valgte man den fžrste
Rather, they are a result of decode.pm.
For Børre, the conversion stops when applying common.xsl on the file.
TODO: Cip to have a look.
884 enh P5 Correct! sjur.moshagen@samediggi.no corpus.dtd seems flawed
To repeat:
xmllint whatever-dtd-file-of-your-choice.dtd
… will give you the same error report.
Status quo
Tomi has work which is not checked in. Check in.
File- and problemspecific problems go to Bugzilla.
Forward
New meeting on monday.