RE: [xsl] Extracting text between nodes

Subject: RE: [xsl] Extracting text between nodes
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Thu, 14 Feb 2008 11:48:20 -0500
Hi,

<td><font><b>Title</b><br/>Some Text<i> and some more italic text</i><b> maybe even some more</b><a href="http://whatever.com";>And an anchor</a></font></td>

A variant of several solutions suggested so far:


<xsl:template match="td">
  <xsl:apply-templates select="font/br[1]/following-sibling::node()"/>
</xsl:template>

<xsl:template match="td/font/a[not(following-sibling::node()]" priority="1002"/>

<xsl:template match="td/font/*" priority="1001">
  <xsl:apply-templates/>
</xsl:template>

I agree with Mike that a clearer statement of the problem is in order. "Extracting the text" might mean what it says (as Mike has construed it) or it might mean what David and Sam have inferred, which given everything we know about common XSLT use cases, and the roughness of the specification, is perhaps equally likely or more so. There are questions (as all have suggested) regarding how general the solution needs to be.

For example, it is common to have to processing the intervening nodes (the italics and such), not just get their text values. (In which case, the last template given here can simply be dropped.)

Cheers,
Wendell

At 04:45 AM 2/14/2008, Mike wrote:

> My original reply was compatible with 1.0 as well -- > > <xsl:template match="a | b[position() = 1]"/> > > Which, as David noted in his post, will "lose the stuff you > don't want". > There are many ways to do this though, and it will depend on > how the HTML will vary from the example you provided.

Both this solution and David Carlisle's handle the example posted, but
neither of them reflect the specification:

I want to extract only the text between the first <br/> tag and the last <a>
anchor tag

which is what I tried to do in my XSLT 2.0 solution.

It's quite possible to achieve that in XSLT 1.0 as well, but it's not much
fun, so I certainly won't attempt it unless there is a clearer statement of
the problem.


======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

Current Thread