Re: [xsl] Including URL-encoded query string in XHTML document

Subject: Re: [xsl] Including URL-encoded query string in XHTML document
From: Jeni Tennison <mail@xxxxxxxxxxxxxxxx>
Date: Thu, 11 Jan 2001 18:07:52 +0000
Hi Yelena,

> I'm trying to process an XML data feed that contains URL-encoded query
> strings, like the following:
>
>         <item url="research.exe?ticker=GS&type=1" date="01/01/2000">goldman
> sachs</item>    

The isn't well-formed XML and the XML parser that you're using should
complain when it sees it. In XML, it's illegal to have a '&' character
that doesn't mark the start of a general entity reference. The XML you
need to use is:

<item url="research.exe?ticker=GS&amp;type=1"
      date="01/01/2000">goldman sachs</item>

The XML that you see in a file is just a *serialisation* of a node
tree. In the node tree, entity references are substituted for whatever
they reference. So the node tree for the above looks like:

+- (element) item
   | +- (attribute) url = research.exe?ticker=GS&type=1
   | +- (attribute) date = 01/01/2000
   +- (text) goldman sachs

Note the url attribute has a value with the character '&' in it rather
than the entity reference.

> Any advice on what is the best way to pass a URL-encoded string through the
> XSLT transformation?
> I substituted "&" with "&amp;" in the original data, but then the output
> XSLT document also contains &amp; and there seems to be no way to print "&"
> as it is.
> Using <xsl:output method="html" > or "disable-output-escape" directives did
> not seem to help. 

When you create some output with XSLT, if it's creating XML it sticks
to XML rules.  So because XML doesn't allow a '&' that isn't the start
of an entity reference, the XSLT processor outputs '&amp;' instead.

When you tell it to output in HTML with <xsl:output method="html" />,
it still sticks with this rule because you can have entity references
in HTML as well, and you need to know when an '&' is an ampersand
character and when it's the start of an entity reference.  Almost
always, an '&' in an HTML node tree will be serialised as '&amp;' when
it's written to a file.

But this shouldn't be a problem. Whatever program looks at the HTML
and reads it should interpret the '&amp;' correctly and Do The Right
Thing. You shouldn't have to worry about it. Obviously it is causing
you a problem though - is it really the case that if you create an
HTML document with the following links in it:

<p>
  <a href="research.exe?ticker=GS&amp;type=1">goldman sachs
  (entity)</a>;
  <a href="research.exe?ticker=GS&type=1">goldman sachs
  (character)</a>;
</p>

that the second works and the first doesn't?  If so, you've got a
dodgy browser.

> and use the stylesheet below to construct an href tag for each item
> element:
>
>         <xsl:template match="item">
>                 <a>
>                         <xsl:attribute name="href">
>                                 <xsl:value-of select="@url">
>                         </xsl:attribute>
>                         <xsl:value-of select="." />
>                 </a>
>         </xsl:template>

It's not directly relevant, but this is equivalent to:

<xsl:template match="item">
   <a href="{@url}"><xsl:value-of select="." /></a>
</xsl:template>

I hope that helps,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread