RE: ISO-8859-1 encoding and XmlDecl omision (was Re: [xsl] Looking up keys in a separate xml file)

Subject: RE: ISO-8859-1 encoding and XmlDecl omision (was Re: [xsl] Looking up keys in a separate xml file)
From: "G. Ken Holman" <gkholman@xxxxxxxxxxxxxxxxxxxx>
Date: Wed, 07 Jan 2004 10:14:39 -0500
At 2004-01-07 14:40 +0000, Andrew Welch wrote:
I currently have the situation where the result of my transform is
included as part of a html page using jsp.

Then use <xsl:output method="html" encoding="iso-8859-1"/>


I use encoding="ascii" to ensure all of my character refs remain as
character refs through to the output,

The encoding doesn't preserve the named HTML entity references, the output method does.


but for this reason (as Ive just
found out) the omit-xml-declaration="yes" is ignored - which means in
the middle of my output I have the xml declaration.

HTML being as forgiving as it is, this isnt a problem, but I would like
it gone - whats the solution here?

I'm not sure it is wise for standards bodies to provide too many solutions to non-problems ... the problem is, I think, with your expectations for a processor to produce something invalid just to accommodate your specialized non-standard situation.


I would have thought that as ascii is a subset of utf-8,

ASCII (7-bit) happens to be a subset of UTF-8, but since you are talking about "character refs" by which I assume you are referring to named HTML character entities recognized by HTML browsers without the need of entity declarations, those are *not* in ASCII, they are in ISO-8859-1 and as has been repeatedly mentioned, this is *not* a subset of UTF-8, nor is UTF-8 a subset of ISO-8859-1.


the processor
could happily leave the declartion out knowing that any future parsing
of the document would use utf-8 (by default) and could correctly read
the file.

But it wouldn't *be* correct. UTF-8 encoding of high-bit ISO-8859-1 characters requires two bytes, where the ISO-8859-1 encoding of ISO-8859-1 characters requires only one byte.


Since what I believe you are asking for is the HTML encoding of ISO-8859-1 characters as named HTML character entities, then specify what you are asking for:

<xsl:output method="html" encoding="iso-8859-1"/>

or probably just:

<xsl:output method="html"/>

An example is below ... note how there is no XML declaration and you get the HTML character entity reference.

I hope this helps.

............................. Ken

T:\ftemp>type andrew.xsl
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
                version="1.0">

<xsl:template match="/">
  Hello &#xe9; World
</xsl:template>

<xsl:output method="html"/>

</xsl:stylesheet>
T:\ftemp>saxon andrew.xsl andrew.xsl

Hello &eacute; World

T:\ftemp>



--
North America (Washington, DC): 3-day XSLT/2-day XSL-FO 2004-02-09
Instructor-led on-site corporate, government & user group training
for XSLT and XSL-FO world-wide:  please contact us for the details

G. Ken Holman                 mailto:gkholman@xxxxxxxxxxxxxxxxxxxx
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/s/
Box 266, Kars, Ontario CANADA K0A-2E0    +1(613)489-0999 (F:-0995)
ISBN 0-13-065196-6                       Definitive XSLT and XPath
ISBN 0-13-140374-5                               Definitive XSL-FO
ISBN 1-894049-08-X   Practical Transformation Using XSLT and XPath
ISBN 1-894049-11-X               Practical Formatting Using XSL-FO
Member of the XML Guild of Practitioners:     http://XMLGuild.info
Male Breast Cancer Awareness  http://www.CraneSoftwrights.com/s/bc


XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list



Current Thread