Re: [xsl] DOCTYPE public and system fields run together in generated output

Subject: Re: [xsl] DOCTYPE public and system fields run together in generated output
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Wed, 17 Oct 2007 20:03:48 +0200
Michael Tracey Zellmann wrote:
I have not been able to find an answer in the archives.

I am generating HTML with an XSLT 1.0 style-sheet transforming XML
directly through a Transformer using the standard resources in Java
JDK 1.5.0_11

My resulting HTML page has this DOCTYPE line

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01
Transitional//EN""http://www.w3.org/TR/html4/loose.dtd";>

The problem is that the two quoted fields are run-together without any
white space. The resulting web-page renders successfully, but fails
W3C validation.

I use this statement in my style-sheet

<xsl:output doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"
doctype-system="http://www.w3.org/TR/html4/loose.dtd " method="html"
indent="yes" encoding="ISO-8859-1" media-type="text/html"/>

I have been able to avoid this problem by using XSLT 2.0 along with
the saxon8.jar to employ the  net.sf.saxon.TransformerFactoryImpl

However, my user  would very much like to solve this staying within
the normal JDK resources and staying with XSLT 1.0

What might I do to solve this?

The easiest? Convince your user to use XSLT 2.0 with Saxon or Gestalt now that (s)he has a good reason to do so. Why use a (though proven) 8 years old technique when the new one is (also proven) much more versatile and promises a higher TTM?


But if you are stuck with using Xalan, here's what you should do:

1. Check your serialization code. Xalan is not likely to make this mistake, no recent version at least.
2. Check your Xalan version with the template below. If it shows nothing, you use an old version, then remove everything in the XPath after 'checkEnvironment()' and investigate the output.
3. If you can't solve it with (1) or (2), use the xsl:text hack with disable-output-escaping to provide the correct string.


But you shouldn't need (3). Here's the template that I tried with Xalan 2.7.0 (including a copy and paste of your code above). Not the newest, I believe, but new enough. Below it, you'll find the output.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xalan="http://xml.apache.org/xalan";
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="1.0">
<xsl:output doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"
doctype-system="http://www.w3.org/TR/html4/loose.dtd"; method="html"
indent="yes" encoding="ISO-8859-1" media-type="text/html"/>
<xsl:template match="/">
<html>
<head>
<title>Show Xalan version</title>
</head>
<body>
<p>
<xsl:copy-of select="
xalan:checkEnvironment()
/*/*/*[@key = 'version.xalan2x']/text()"/> </p>
</body>
</html>
</xsl:template>
</xsl:stylesheet>



The output from the above stylesheet, when run with Xalan, surprisingly doesn't indent (most other processors do when you ask them to) but for the rest of it, there's nothing odd I can find (and note that the quoted strings in the doctype are separated by a space):


<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd";>
<html xmlns:xalan="http://xml.apache.org/xalan";>
<head>
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>Show Xalan version</title>
</head>
<body>
<p>Xalan Java 2.7.0</p>
</body>
</html>



The only difference with your statements was the version of the Java JDK. I used 1.5.0_02 for this test. It is possible that the bug is introduced only in a later version (of JDK or Xalan or Xerces or JAXP or SAX), but I find it quite unlikely.


Hope this brings you closer to a solution,

Cheers,
-- Abel Braaksma

Current Thread