Re: [xsl] Re: Re: Using XSLT to add markup to a document

Subject: Re: [xsl] Re: Re: Using XSLT to add markup to a document
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Tue, 8 Jul 2003 10:56:05 +0100
Dimitre wrote:
> Another problem with this solution is that it finds the strings not
> strictly from left to right (when we search for words as opposed to
> generally strings this may not be a problem -- my knowledge of
> English does not allow me to make a strong conclusion).

All Dimitre's observations about the inadequacy, in the general case,
of the solution David and I were discussing are correct. Flexible,
general solutions to marking up a string using XSLT 1.0 are not

It's interesting to see what the regular expression processing in XSLT
2.0 can do to help here. With the words hard-coded into the
stylesheet, it would look like:

  <xsl:analyze-string select="$text" regex="relation|core">
      <special><xsl:value-of select="." /></special>
      <xsl:value-of select="." />

<xsl:analyze-string> is defined such that the first matching substring
gets picked up, so if you have:

  There is a strong corelation...

then you get:

  There is a strong <special>core</special>lation...

There's no definition in the spec about what happens if you have
overlapping matching substrings, for example:

  <xsl:analyze-string select="$text" regex="relation|core|corelation">

(Saxon 7 picks the one that appears first in the regex.) I think that
this is a bug in the spec, and I'll raise it as an issue; I think
probably it should select the longest match.

You can generate the regular expression that's used for the string
dynamically, with an attribute value template. So for example, you
could have:

<xsl:template name="markup" as="xs:string">
  <xsl:param name="text" as="xs:string" />
  <xsl:param name="replacements" as="item()*" />
  <xsl:variable name="regex" as="xs:string">
    <xsl:value-of select="$replacements" separator="|" />
  <xsl:analyze-string select="$text" regex="{$regex}">

in which case the markup function can be called with:

  <xsl:call-template name="markup">
    <xsl:with-param name="text"
                    select="'There is a strong corelation...'" />
    <xsl:with-param name="replacements"
                    select="('core', 'relation', 'corelation')" />

though to be thorough, you'd need to make sure that you escaped any
regex-significant characters in the replacement strings.

To get the longest match first, at least using Saxon 7, you can sort
the replacements by length, with the longer ones first (or
alphabetically in reverse order will give you the same result):

  <xsl:variable name="regex" as="xs:string">
    <xsl:for-each select="$replacements">
      <xsl:sort select="string-length(.)" order="descending" />
      <xsl:value-of select="." />
      <xsl:if test="position() != last()">|</xsl:if>

Getting whole-word-only matches is much more complicated, in fact I
can't think of a good approach right now, but perhaps someone else


Jeni Tennison

 XSL-List info and archive:

Current Thread