[xsl] converting Word dictionary to FLEx

Subject: [xsl] converting Word dictionary to FLEx
From: "Jim Albright jim_albright@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 18 Sep 2018 21:47:37 -0000
Using Saxon 9.8.0.12 in Oxygen
Style sheet version="2.0"
Problem domain is getting a dictionary created in Word with only <p>s, <span>s
and <b>, and <i> along with some color added to some spans.
In plain text it looks like:

#-a (dem. adj. of proximity)
variant of -ad

#a-1(+a.f./i.a. verb)
1. so, in order that perhaps <D2>
2. (particle introducing a.f., indicating `near future' or `future
possibility') <Asp1.19> <D2>
Variant Forms:
ad-(+a.f./i.a. verb) (in 1st person singular and third person plural)
1. so, in order that perhaps
2. (particle introducing a.f., indicating `near future' or `future
possibility')
riI# ad-ftuI# I want to go.
ira a-t-ia:r He wants to see it.
a-ka-(+a.f.) if only
a(d)-ur-(+a.f./i.a.)
so, lest, in order that perhaps not
(also introduces neg. imp.: "Do not...")
a-ur-imil-(+a.f./i.a.)
perhaps, in order that, in the hope that; lest, maybe it would happen that
ad-ukJ7an- (+a.f./i.a.)
1. when, as soon as <Asp1.24> <Na3.10.6>
2. just, repeatedly <Na3.16.2>
ad-ur- (+a.f./i.a.)
so, lest, in order that perhaps not
(also introduces neg. imp.: "Do not...")
Variant Forms:
ad-


Turn this into a flat file suitable to import into a dictionary processing
program called FLEx.
Something like:
\lx -a
\gi (dem. adj. of proximity)
\vao -ad

\lx a-
\hm 1
\co (+a.f./i.a. verb)
\sn 1
\de so, in order that perhaps \so <D2>
\sn 2
\gi (particle introducing a.f., indicating `near future' or `future
possibility')
\so <Asp1.19>
\so <D2>
\sh Variant Forms:
\va ad-
\co (+a.f./i.a. verb)
\gi (in 1st person singular and third person plural)
\sn 1
\de so, in order that perhaps
\sn 2
\gi (particle introducing a.f., indicating `near future' or `future
possibility')
\xv riI# ad-ftuI#
\xe I want to go.
\xv ira a-t-ia:r
\xe He wants to see it.
\va a-ka-
\co (+a.f.) if only
\va a(d)-ur-
\co (+a.f./i.a.)
\de so, lest, in order that perhaps not
\gid (also introduces neg. imp.: "Do not...")
\va a-ur-imil-
\co (+a.f./i.a.)
\de perhaps, in order that, in the hope that; lest, maybe it would happen
that
\va ad-ukJ7an-
\co (+a.f./i.a.)
\sn 1
\de when, as soon as
\so <Asp1.24>
\so <Na3.10.6>
\sn 2
\de just, repeatedly
\so <Na3.16.2>
I have processed the html output from word into the following snippet:

\entry_number 00001
\lx -a
\vernacular FALSE
\grammatical_info dem. adj. of proximity)
\variant_of -ad

\entry_number 00002
\lx a-
\hm 1
\vernacular FALSE
\co (+|ga a.f.|r |ga i.a.|r  verb)
\senseStart 1
\definition  so, in order that perhaps
\source D2
\senseStart 2
\grammatical_info particle introducing |ga a.f.|r , indicating `near future'
or `future possibility')
\source Asp1.19
\source D2
\sectionHead Variant Forms:
\variant ad-
\co (+|ga a.f.|r |ga i.a.|r verb
\grammatical_info in 1|sup st|r person singular and third person plural)
\senseStart 1
\definition  so, in order that perhaps
\senseStart 2
\grammatical_info particle introducing |ga a.f.|r , indicating `near future'
or `future possibility')
<<<<<< above is correct

\example riI#I want to go.			<<<<<< what I get
\example iraHe wants to see it.

\example riI# ad-ftuI# 				<<<<< what I am looking for.     I need two more
words here. ad-ftuI#
\translation I want to go.
\example ira a-t-ia:r
\translation He wants to see it.

The exact slash codes are not important. Getting ALL the data across is.
I have only added the Arial class so far on this instead of <span
style="font-family:&#34;Arial&#34;,sans-serif" lang="EN-GB"> it is  <span
class="Arial">
I am starting with this snippet of code in HTML.
<p> ...

            <span class="Arial">verb) (in 1<sup>st</sup>person singular and
third person plural)
	<br />1. so, in order that perhaps
	<br />2. (<i>particle introducing a.f., indicating `ne  ar future' or `future
possibility'</i>)
               <br />
            </span>
            <span class="MsoHyperlink">
               <b>
                  <span lang="EN-GB">riI#</span>
               </b>
            </span>
            <b>
               <span lang="EN-GB">ad-</span>
               <span class="MsoHyperlink">
                  <span lang="EN-GB">ftuI#</span>
               </span>
            </b>

            <span class="Arial">I want to go.<br />
   .....
</p>

My guess so far is to match the <br/> and then look for <b> words following
but donbt include <b> after <span class="Arial" that turns into \translation
.

    <xsl:template match="html:br">
        <xsl:element name="span">
            <xsl:attribute name="class">example</xsl:attribute>
            <xsl:value-of select="following::html:b"/>    <<<<<<<<<<<< this
gives too many
        </xsl:element>
    </xsl:template>

I hold the slash code in the class attribute until the last step. That way I
can continue working on the file in XML.

How do I restrict the <xsl:value-of select="following::html:b"/> to just the
ones before the next
<span class="Arial">I want to go.<br />

Thank you

Jim Albright
704-562-1529 unlimited cell
Wycliffe Bible Translators

Current Thread