[xsl] Escaped characters being duplicated

Subject: [xsl] Escaped characters being duplicated
From: "Anderson, Paul" <Paul.Anderson@xxxxxxxxxxxxx>
Date: Tue, 11 Dec 2007 18:06:40 -0500
Greetings All,

We have a bunch of DITA XML content and we're using the open-source DITA
Open Toolkit to transform it into a variety of outputs. The DITA Open
Toolkit is a collection of Java classes, XSL stylesheets, and ANT
scripts that transform the content and create the output.

To shield our users from the command-line invocation of the publishing
scripts, we deployed a simple web application running on Tomcat 5.5 that
takes input from a JSP page and invokes the necessary ANT script to
generate the desired output for the user. This methodology has been
working quite nicely for nearly a year.

Over that time, a few of our users are having a problem where characters
escaped in the XML content (for example, angle brackets and ampersands)
are duplicated in the output. For example, in the place of one
angle-bracket (&lt;), we end up with two or sometimes four escaped angle
brackets (&lt;&lt;&lt;&lt;).

I've been troubleshooting the problem and the duplication always appears
in the output files generated by one of the XSL stylesheets in the DITA
Open Toolkit. If the input file contained an escaped character, the
output file contains two of those escaped characters. The most
interesting discovery so far is this: For each user that has the
problem, the problem goes away if they invoke the ANT script via the
command line; the duplication only occurs when the ANT script is invoked
from the JSP page running on Tomcat 5.5. Having said that, the problem
only exists for a few users; most users never see this problem when they
use the JSP page to invoke the ANT script and publish the exact same XML
content.

Perplexing.

Given all this background, my plea to this list is simple: What sort of
conditions cause an XSL transformation to duplicate an escaped
character?

Would the system locale have an impact?
Would the Java version (1.5 versus 1.6) have an impact?
All source files use UTF-8 encoding.
All users are using the same XSL processor: Saxon 6.5.5.
I don't think the problem is in the XSL stylesheet or any other part of
the DITA Open Toolkit because all users are using the same code and it
works for most users.

Any ideas about his issue are appreciated.

Best regards,

Paul Anderson
Information Developer - Codex Administrator
Compuware Corporation
The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.

Current Thread