This file contains the current work with filtering and freezing of smanob files stemming from the dict directory. It is aimed to ease the communication between Cip, Lene and Trond, the reason is that the 00_readme.txt file was meant for the steady filtering and reverting files. Reverting smaswe to swesma VIC - Very Important Check before any reverting to swesma: - check the language flag in all files systematically (cip) ==> done 1. Cip supposes that Ryan, Lene & Co don't need all attributes from sma-lemma into sma-translation (ignore stat="pref" for now). E.g. suejies vs. suejies 2. Cip's reverting script copies the -elements into the tg-element of the reverted file. Trond means that re-info is not needed at all in smaoahpa. It is true that apparently there is no re-element in the nobsma-files but there is some re-information in the l-element. nobsma>grep '(om ' * | wc -l 42 v_nobsma.xml: reise seg (om hår) v_nobsma.xml: reke uten lov (om barn) v_nobsma.xml: slippe lett (om bark) Question: What to do with that? Shall I add the re automatically to the l-element in brackets? Trond: Framlegg: Lag to lister og (la lingvistane) ta ein titt. 3. weird IDs: n_swesma.xml: Hei! Jeg husker ikke om vi har snakka om det tidligere, men: multiword skal fremdeles være multiword etter snuinga. Dvs at pos-informasjonen ikke skal ha noen funskjon i snuingsprosessen. Selv om nob-oversettelsen består av ett ord, skal det tilhører multiword-fila. Multiword viser til at sma er multiword. - Lene 1. this information is anyhow in the data, namely with the sma entrie (which becomes now a translation of the nob entry) Trond: Viss mwe-info kan bli styrt frå same swesma-fil er det greit. Lenes poeng slik eg ser det var ikkje at vi må ha ei multiword __fil__, men at ordpara skal vere mwe også etter snuinga. 2. after reverting you have also dupicates stemming from different smanob files in the nob data; these have to be merged in order not to get some messy stuff with the database Trond: Ok, dette er eit teknisk spørsmål. 3. technically, it is possible to have the former multiword entries in the same file DESPITE the fact that the nob entries don't carry with them the pos "multiword": the entries can be traced basen on the pos of the sma "translations" (see 1. item above) Trond: Nettopp. Altså: separat fil eller ikkje er eit teknisk spørsmål, det viktige er å halde på den semantiske gruppa Multiword.