Re: [xsl] (Re-)Escaping entities in input text

Subject: Re: [xsl] (Re-)Escaping entities in input text
From: "Andrew Welch" <andrew.j.welch@xxxxxxxxx>
Date: Wed, 20 Aug 2008 16:00:04 +0100
> Now assume I have this piece of code (I'm writing it on the run, please be lenient :-) ):
> <xsl:template match="foo">
>  <xsl:variable name="my_xml">
>    <xsl:text><bar></xsl:text>
>    <xsl:value-of select="." />
>    <xsl:text></bar></xsl:text>
>  </xsl:variable>
>  <xsl:value-of select="java_class:function" />
> </xsl:template>

I think your mailer is parsing &lt into <

For the above I expect you mean:

<xsl:text>&lt bar &gt </xsl:text>  (with the ; left off for that reason)

Again this is A Bad Thing and not needed.  Start and end tags only
exist in the serialized form of XML.  When the XML is parsed those
start and end tags become a single node in the input tree.  XSLT
operates on the input tree and adds nodes to the result tree, and then
the serializer operates on the result tree generating start and end
tags for each node.  Hopefully you can see why trying to add start and
end tags to the result tree doesn't make sense...

If you post the problem you are trying to solve with small complete
examples of input and output then it's likely you'll get a response
showing how to solve it.


> The point of this template is to create a pseudo XML file in a string (my_xml), and pass it on to a java function (java_class:function) which will process it. However, doing it this way, my_xml will have the following content:
> <bar>a < b</bar>
> which is not well-formed, and thence couldn't be parsed by an XML parser in my java class.
>
> So what i'm looking for is a way of outputting, *in my internal string*,  "a < b" instead of "a < b".
>
> I don't think this is bad practice, is it? I mean, definitely there are some cases where XSLT just cannot handle everything, and the processing of a piece of XML have to be handed over to some other processor :-).
>

XSLT 1.0 does need the occasionally need the help of extensions (have
a look at exslt.org) but often it's abused and the most trivial of
tasks are offloaded to Java which makes maintenance a real pain.  My
advice is use extensions as a last resort.  Or best of all use XSLT
2.0 where you won't need extensions...


> On a related note: could it be that Saxon uses ISO-8859-1 instead of UTF-8 internally?? My source file is definitely UTF-8, but when I pass a string containing special characters (in that case german umlauts) to my Java class, I'm getting '?' (question marks) instead of the 2-byte codepoints... Any idea why this is happening, or how to avoid that??
>

Many reasons - wherever byte-to-character conversion takes place and
vice versa.  Set the system property -Dfile.encoding=utf-8 first and
if you're still having problems post back with more details.


-- 
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/

Current Thread