Re: Q.) Encode URL inside HTML Anchor Tag.

Subject: Re: Q.) Encode URL inside HTML Anchor Tag.
From: Mike Brown <mike@xxxxxxxx>
Date: Fri, 17 Nov 2000 17:41:12 -0700 (MST)
Stephen Cunliffe wrote:
> Is there a function/transformation available in XSL that will allow me
> to encode URLs in HTML files?

There is no built-in function for this purpose, no.

If your XSLT processor supports extension functions and is Java based,
as I believe yours is, you can simply invoke the encode() method of
java.net.URLEncoder, passing it the string to encode.

The example I have below is for James Clark's XT, and also works as-is
with SAXON. I'm sure by just adjusting the 'url' namespace declaration
it will work with Cocoon/Xalan equally well:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";    
   version="1.0"
   xmlns:url="http://www.jclark.com/xt/java/java.net.URLEncoder";
   exclude-result-prefixes="url">

  <xsl:output method="html" indent="yes"/>
  
  <xsl:template match="/">
    <xsl:variable name="x" select="'encode me #1 superstar?'"/>
    <xsl:if test="function-available('url:encode')">
      <a href="http://www.skew.org/printenv?foo={url:encode($x)}">
        <xsl:value-of select="$x"/> 
      </a>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

Some XSLT processors (namely SAXON) will automatically URL-encode the
values of href attributes. Even though you can turn this off in SAXON,
there are reasons why this is undesirable in general and I don't
recommend that other processors adopt this practice.

The only other alternative is to stick with pure XSLT and write a
stylesheet that does the encoding via substring lookups and tail
recursion. This is a rather daunting task considering there is no
easy way to determine the UTF-8 sequence for a given character.

(The URL-encoding algorithm is, roughly, replace certain reserved
characters with their UTF-8 sequences, expressed as '%xx' for
each octet, where xx is the hexadecimal representation of the
octet; with the option of using '+' instead of '%20' for spaces.
As you seem to already understand, this translation applies only to
certain parts of the URI while the URI is constructed, not afterward,
which is one of the reasons why SAXON's behavior is not desirable.)

I attempted to make such a stylesheet for the ASCII and Latin-1 range and
it was not very pretty.

   - Mike
____________________________________________________________________
Mike J. Brown, software engineer at         My XML/XSL resources:
webb.net in Denver, Colorado, USA           http://www.skew.org/xml/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread