Re: [xsl] match string

Subject: Re: [xsl] match string
From: Anton Triest <anton@xxxxxxxx>
Date: Wed, 20 Oct 2004 22:17:32 +0200
Hi Wendell,

You want either:


(collects all the text nodes, returns only the first)



(returns the first descendant text node).

OK... but now the problem is, none of both seem to be valid in a match pattern.

<xsl:template match="para(//text())[1]"> saxon says: "The only functions allowed in a pattern are id() and key()"
<xsl:template match="para/descendant::text()[1]"> saxon says: "Axis in pattern must be child or attribute"

(The first one is strange: is text() really a function? And even then, why is "para//text()[1]" a valid pattern and "para(//text())[1]" isn't?)

So I guess I'd have to use one of them in an apply-templates select attribute (instead of in match) but I'm stuck on how to combine that with the identity template. I could select "para(//text())[1]" but how would I select all the rest then (something like "para(//text())[position() > 1]" won't work).

Input XML:

<para>A paragraph without any markup</para>
<para> Beware of leading whitespace </para>
<para>A paragraph with some <i>markup</i> inside</para>
<para>A paragraph with some <b><i>nested</i> markup</b></para>
<para><em>This is a special case:</em> paragraph starts with markup</para>
<para><em>This</em> is difficult: only the first word has markup</para>

The goal is, to isolate the first 3 words of each paragraph. Desired output:

<para><first>A paragraph without </first>any markup</para>
<para><first>Beware of leading </first>whitespace</para>
<para><first>A paragraph with </first>some <i>markup</i> inside</para>
<para><first>A paragraph with </first>some <b><i>nested</i> markup</b></para>
<para><em><first>This is a </first>special case:</em> paragraph starts with markup</para>
<para><em><first>This</first></em> is difficult: only the first word has markup</para>

The last one is especially difficult, ideally that would be
<para><first><em>This</em> is difficult:</first> only the first word has markup</para>

Stylesheet so far:

<xsl:stylesheet version="1.0" xmlns:xsl="";>
<xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:param name="split" select="3"/>

   <!-- identity template: copy all elements -->
   <xsl:template match="*">
           <xsl:copy-of select="@*"/>

   <xsl:template match="para/text()[1]">  <!--  <  <  <  -->
       <xsl:call-template name="split-words"/>

<xsl:template name="split-words">
<xsl:param name="i" select="0"/>
<xsl:param name="str1" select="''"/>
<xsl:param name="str2" select="normalize-space(.)"/>
<xsl:when test="$i = $split">
<first><xsl:value-of select="$str1"/></first>
<xsl:value-of select="$str2"/>
<xsl:when test="contains($str2,' ')">
<xsl:call-template name="split-words">
<xsl:with-param name="i" select="$i+1"/>
<xsl:with-param name="str1" select="concat($str1,substring-before($str2,' '),' ')"/>
<xsl:with-param name="str2" select="substring-after($str2,' ')"/>
<xsl:call-template name="split-words">
<xsl:with-param name="i" select="$split"/>
<xsl:with-param name="str1" select="concat($str1,$str2)"/>
<xsl:with-param name="str2" select="''"/>


Output: correct except for the last 2 para's

<para><first>A paragraph without </first>any markup</para>
<para><first>Beware of leading </first>whitespace</para>
<para><first>A paragraph with </first>some<i>markup</i> inside</para>
<para><first>A paragraph with </first>some<b><i>nested</i> markup</b></para>
<para><em>This is a special case:</em><first>paragraph starts with </first>markup</para>
<para><em>This</em><first>is difficult: only </first>the first word has markup</para>


Current Thread