|
Subject: [xsl] converting Word dictionary to FLEx From: "Jim Albright jim_albright@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Tue, 18 Sep 2018 21:47:37 -0000 |
Using Saxon 9.8.0.12 in Oxygen
Style sheet version="2.0"
Problem domain is getting a dictionary created in Word with only <p>s, <span>s
and <b>, and <i> along with some color added to some spans.
In plain text it looks like:
#-a (dem. adj. of proximity)
variant of -ad
#a-1(+a.f./i.a. verb)
1. so, in order that perhaps <D2>
2. (particle introducing a.f., indicating `near future' or `future
possibility') <Asp1.19> <D2>
Variant Forms:
ad-(+a.f./i.a. verb) (in 1st person singular and third person plural)
1. so, in order that perhaps
2. (particle introducing a.f., indicating `near future' or `future
possibility')
riI# ad-ftuI# I want to go.
ira a-t-ia:r He wants to see it.
a-ka-(+a.f.) if only
a(d)-ur-(+a.f./i.a.)
so, lest, in order that perhaps not
(also introduces neg. imp.: "Do not...")
a-ur-imil-(+a.f./i.a.)
perhaps, in order that, in the hope that; lest, maybe it would happen that
ad-ukJ7an- (+a.f./i.a.)
1. when, as soon as <Asp1.24> <Na3.10.6>
2. just, repeatedly <Na3.16.2>
ad-ur- (+a.f./i.a.)
so, lest, in order that perhaps not
(also introduces neg. imp.: "Do not...")
Variant Forms:
ad-
Turn this into a flat file suitable to import into a dictionary processing
program called FLEx.
Something like:
\lx -a
\gi (dem. adj. of proximity)
\vao -ad
\lx a-
\hm 1
\co (+a.f./i.a. verb)
\sn 1
\de so, in order that perhaps \so <D2>
\sn 2
\gi (particle introducing a.f., indicating `near future' or `future
possibility')
\so <Asp1.19>
\so <D2>
\sh Variant Forms:
\va ad-
\co (+a.f./i.a. verb)
\gi (in 1st person singular and third person plural)
\sn 1
\de so, in order that perhaps
\sn 2
\gi (particle introducing a.f., indicating `near future' or `future
possibility')
\xv riI# ad-ftuI#
\xe I want to go.
\xv ira a-t-ia:r
\xe He wants to see it.
\va a-ka-
\co (+a.f.) if only
\va a(d)-ur-
\co (+a.f./i.a.)
\de so, lest, in order that perhaps not
\gid (also introduces neg. imp.: "Do not...")
\va a-ur-imil-
\co (+a.f./i.a.)
\de perhaps, in order that, in the hope that; lest, maybe it would happen
that
\va ad-ukJ7an-
\co (+a.f./i.a.)
\sn 1
\de when, as soon as
\so <Asp1.24>
\so <Na3.10.6>
\sn 2
\de just, repeatedly
\so <Na3.16.2>
I have processed the html output from word into the following snippet:
\entry_number 00001
\lx -a
\vernacular FALSE
\grammatical_info dem. adj. of proximity)
\variant_of -ad
\entry_number 00002
\lx a-
\hm 1
\vernacular FALSE
\co (+|ga a.f.|r |ga i.a.|r verb)
\senseStart 1
\definition so, in order that perhaps
\source D2
\senseStart 2
\grammatical_info particle introducing |ga a.f.|r , indicating `near future'
or `future possibility')
\source Asp1.19
\source D2
\sectionHead Variant Forms:
\variant ad-
\co (+|ga a.f.|r |ga i.a.|r verb
\grammatical_info in 1|sup st|r person singular and third person plural)
\senseStart 1
\definition so, in order that perhaps
\senseStart 2
\grammatical_info particle introducing |ga a.f.|r , indicating `near future'
or `future possibility')
<<<<<< above is correct
\example riI#I want to go. <<<<<< what I get
\example iraHe wants to see it.
\example riI# ad-ftuI# <<<<< what I am looking for. I need two more
words here. ad-ftuI#
\translation I want to go.
\example ira a-t-ia:r
\translation He wants to see it.
The exact slash codes are not important. Getting ALL the data across is.
I have only added the Arial class so far on this instead of <span
style="font-family:"Arial",sans-serif" lang="EN-GB"> it is <span
class="Arial">
I am starting with this snippet of code in HTML.
<p> ...
<span class="Arial">verb) (in 1<sup>st</sup>person singular and
third person plural)
<br />1. so, in order that perhaps
<br />2. (<i>particle introducing a.f., indicating `ne ar future' or `future
possibility'</i>)
<br />
</span>
<span class="MsoHyperlink">
<b>
<span lang="EN-GB">riI#</span>
</b>
</span>
<b>
<span lang="EN-GB">ad-</span>
<span class="MsoHyperlink">
<span lang="EN-GB">ftuI#</span>
</span>
</b>
<span class="Arial">I want to go.<br />
.....
</p>
My guess so far is to match the <br/> and then look for <b> words following
but donbt include <b> after <span class="Arial" that turns into \translation
.
<xsl:template match="html:br">
<xsl:element name="span">
<xsl:attribute name="class">example</xsl:attribute>
<xsl:value-of select="following::html:b"/> <<<<<<<<<<<< this
gives too many
</xsl:element>
</xsl:template>
I hold the slash code in the class attribute until the last step. That way I
can continue working on the file in XML.
How do I restrict the <xsl:value-of select="following::html:b"/> to just the
ones before the next
<span class="Arial">I want to go.<br />
Thank you
Jim Albright
704-562-1529 unlimited cell
Wycliffe Bible Translators
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] Group and change heading , Wendell Piez wapiez@ | Thread | Re: [xsl] converting Word dictionar, Michael Kay mike@xxx |
| Re: [xsl] Group and change heading , Wendell Piez wapiez@ | Date | Re: [xsl] converting Word dictionar, Michael Kay mike@xxx |
| Month |