Re: [xsl] Matching only text nodes with certain (complicated) properties

Subject: Re: [xsl] Matching only text nodes with certain (complicated) properties
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Fri, 09 Jan 2009 13:11:02 -0500
David,

You are getting the false matches because you are also matching the empty text nodes. So adjust further:

body//text()[normalize-space()][preceding::text()[normalize-space()][1] &lt;&lt; preceding::pb[1]]

This ought to preclude getting that page info more than once, too. If that's a problem let us know.

Another approach to this -- and an attractive one, since your schema is known and your data well-controlled -- is to use xsl:strip-space to strip whitespace from elements you know will never have any significant whitespace. Those would be elements that can contain only elements according to the schema. Then you wouldn't need to do the extra filtering using normalize-space().

Cheers,
Wendell

At 12:53 PM 1/9/2009, you wrote:
Only now that I'm reading your replies am I understanding what "preceding::" actually matches. Thanks!

Good clue with the "normalize-space()", Wendell, but still, somehow whitespace seems to be a problem:

An original XML document (TEI):

...
<item n="c">The <mentioned>i</mentioned> of the nom. before a vowel in the RV.

<pb n="26"/>

<list>
  <item n="a">The <mentioned>i</mentioned> of the ...
...

after applying the following XSL 2.0 Transformation template (among others; the "body//" part of the match ensures that only text nodes from the <body> of the document are considered):

<xsl:template match="body//text()[preceding::text()[normalize-space()][1] &lt;&lt; preceding::pb[1]]">
<span class="pagenumber">page <xsl:value-of select="preceding::pb[1]/@n"/></span>
<xsl:apply-templates/>
</xsl:template>


becomes:

...
<li>The <span class="ved">i</span> of the nom. before a vowel in the RV.

<span class="pagenumber">page 26</span>
  <ol style="list-style-type:lower-greek">
    <span class="pagenumber">page 26</span>
    <li>
      <span class="pagenumber">page 26</span>
      <span class="ved">i</span> of the ...
...

You see, a lot of <span>s are added not just to the very first text node. These seem to be added just around those places where I have a <pb/> in the original, so I suppose it's got to do with whitespace (there's always one empty line before and after <pb/> in the source XML).

I'm using Saxon B 9.1.0.3 for my XSL 2.0 transformation (in Oxygen).

I'm looking into the thing some more today but thank you for your replies so far.


======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

Current Thread