RE: [xsl] Escaped characters being duplicated

Subject: RE: [xsl] Escaped characters being duplicated
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Tue, 11 Dec 2007 23:22:58 -0000
Perplexing indeed.

I'd be less surprised if the output came out as "&amp;lt;" rather that
"&lt;&lt;". That's much more common, and could be caused by processing text
twice when it should only be processed once. 

The conversion from "<" to "&lt;" is done by the XML serializer. The fact
that you're using the Saxon XSLT processor doesn't necessarily mean that
you're using the Saxon serializer (the Saxon output could be sent to a DOM
which is then serialized using the DOM serializer); it would be a good idea
to find out what serializer is actually being used. The easiest way to find
out is to see whether the serialization is affected by xsl:output
declarations in the stylesheet.

How did you satisfy yourself that both the successful and the unsuccessful
runs are using Saxon 6.5.5? JAXP is a wonderful beast, and ensures that many
people are running a different XSLT processor from the one they thought they
were using.

Michael Kay

> -----Original Message-----
> From: Anderson, Paul [mailto:Paul.Anderson@xxxxxxxxxxxxx] 
> Sent: 11 December 2007 23:07
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] Escaped characters being duplicated
> Greetings All,
> We have a bunch of DITA XML content and we're using the 
> open-source DITA Open Toolkit to transform it into a variety 
> of outputs. The DITA Open Toolkit is a collection of Java 
> classes, XSL stylesheets, and ANT scripts that transform the 
> content and create the output. 
> To shield our users from the command-line invocation of the 
> publishing scripts, we deployed a simple web application 
> running on Tomcat 5.5 that takes input from a JSP page and 
> invokes the necessary ANT script to generate the desired 
> output for the user. This methodology has been working quite 
> nicely for nearly a year.
> Over that time, a few of our users are having a problem where 
> characters escaped in the XML content (for example, angle 
> brackets and ampersands) are duplicated in the output. For 
> example, in the place of one angle-bracket (&lt;), we end up 
> with two or sometimes four escaped angle brackets (&lt;&lt;&lt;&lt;).
> I've been troubleshooting the problem and the duplication 
> always appears in the output files generated by one of the 
> XSL stylesheets in the DITA Open Toolkit. If the input file 
> contained an escaped character, the output file contains two 
> of those escaped characters. The most interesting discovery 
> so far is this: For each user that has the problem, the 
> problem goes away if they invoke the ANT script via the 
> command line; the duplication only occurs when the ANT script 
> is invoked from the JSP page running on Tomcat 5.5. Having 
> said that, the problem only exists for a few users; most 
> users never see this problem when they use the JSP page to 
> invoke the ANT script and publish the exact same XML content.
> Perplexing.
> Given all this background, my plea to this list is simple: 
> What sort of conditions cause an XSL transformation to 
> duplicate an escaped character? 
> Would the system locale have an impact?
> Would the Java version (1.5 versus 1.6) have an impact?
> All source files use UTF-8 encoding.
> All users are using the same XSL processor: Saxon 6.5.5.
> I don't think the problem is in the XSL stylesheet or any 
> other part of the DITA Open Toolkit because all users are 
> using the same code and it works for most users.
> Any ideas about his issue are appreciated.
> Best regards,
> Paul Anderson
> Information Developer - Codex Administrator Compuware 
> Corporation The contents of this e-mail are intended for the 
> named addressee only. It contains information that may be 
> confidential. Unless you are the named addressee or an 
> authorized designee, you may not copy or use it, or disclose 
> it to anyone else. If you received it in error please notify 
> us immediately and then destroy it.

Current Thread