Re: [xsl] newbie: searching for web links in text

Subject: Re: [xsl] newbie: searching for web links in text
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Tue, 13 Nov 2001 23:23:33 +0000
Hi Bryan,

> I want to search a block of text in an XML document, and if there is
> a http:// ... or a www. ... string, to grab it and convert it into
> an HTML link in a transformation.

OK. I guess that there might be more than one such string within your
document, so you need a recursive template to work through the string
looking for those URLs. Let's call it "createLinks"; it needs a single
parameter, the string that you want it to search:

<xsl:template name="addLinks">
  <xsl:param name="string" />
  ...
</xsl:template>

You only need to bother doing something in the string if it contains
the string 'www.' or the string 'http://'. Otherwise, you just need to
return the string as it is:

<xsl:template name="addLinks">
  <xsl:param name="string" />
  <xsl:choose>
    <xsl:when test="contains($string, 'www.')">
      ...
    </xsl:when>
    <xsl:when test="contains($string, 'http://')">
      ...
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$string" />
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

Now, if the string contains a 'www.' then you first need to call the
template on the part of the string before the 'www.', just in case it
contains a 'http://'. Then you need to create the link, using the
string from the 'www.' up to the next space after the 'www.' (or the
end of the string if it doesn't contain a space) as the content of the
link (and the basis of the URL). Then you call the template on
whatever's left of the string. So something like:

<xsl:template name="addLinks">
  <xsl:param name="string" />
  <xsl:choose>
    <xsl:when test="contains($string, 'www.')">
      <xsl:call-template name="addLinks">
        <xsl:with-param name="string"
                        select="substring-before($string, 'www.')" />
      </xsl:call-template>
      <xsl:variable name="rest"
        select="concat('www.', substring-after($string, 'www.')" />
      <xsl:choose>
        <xsl:when test="contains($rest, ' ')">
          <xsl:variable name="url"
                        select="substring-before($rest, ' ')" />
          <a href="http://{url}";>
            <xsl:value-of select="$url" />
          </a>
          <xsl:call-template name="addLinks">
            <xsl:with-param name="string"
              select="concat(' ', substring-after($rest, ' ')" />
          </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
          <a href="http://{$rest}";>
            <xsl:value-of select="$rest" />
          </a>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:when>
    <xsl:when test="contains($string, 'http://')">
      ...
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$string" />
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

The contents of the xsl:when for when the string contains 'http://'
contains a similar kind of pattern.

As an added complication, you need to think about situations where the
character after the URL isn't a space. For example, in your second
example the URL http://www.msnbc.com was immediately followed by a
full stop. To spot that kind of thing you'll need to do some clever
processing to work out exactly what the URL is, and then use the
string-length() of that URL with the substring() function to work out
what the "rest of the string" is on which to recurse.

Unfortunately, adding content markup to plain text isn't something
that XPath/XSLT is particularly good at. It will probably be easier in
XPath 2.0 if we get regular expressions (as currently seems likely).

I hope that helps,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread