RE: ISO-8859-1 encoding and XmlDecl omision (was Re: [xsl] Looking up keys in a separate xml file)

Subject: RE: ISO-8859-1 encoding and XmlDecl omision (was Re: [xsl] Looking up keys in a separate xml file)
From: "John Meyer" <jmeyer@xxxxxxxxxxx>
Date: Tue, 6 Jan 2004 11:55:51 -0500
David,

   "yes, but a general parsed entity in ISO-8859-1 encoding must have an
   encoding declaration to be well formed."

Not always though. If the encoding is specified in the transport of the
XML, it's still considered to be well-formed. I was just very concerned
that this implied that the parser should ignore the omit-xml-declaration
directive. This would concern me since I could do something like the
following.

 stream.write( '<?xml version="1.0" encoding="iso-8859-1"?>' );
 stream.write( '<MyRootElement>' );
 stylesheet1.transform( input1, xslParams1, stream );
 stylesheet2.transform( input2, xslParams2, stream );
 stream.write( '</MyRootElement>' );

If the processor ignored the omit-xml-declaration, then this would cause
the xml stream to not be well-formed since the (duplicate)
xml-declarations do not occur at the beginning of the stream.

It's an interesting problem since it lays the burden of maintaining
encoding information on the program or user running the transformation.
It's also why I said you should not use the omit-xml-declaration unless
you are certain it should be omitted.

John Meyer
Senior Software Engineer
Clinician Support Technology
1 Wells Avenue, Suite 201
Newton, MA 02459
www.cstlink.com

-----Original Message-----
From: David Carlisle [mailto:davidc@xxxxxxxxx] 
Sent: Tuesday, January 06, 2004 11:02 AM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: ISO-8859-1 encoding and XmlDecl omision (was Re: [xsl]
Looking up keys in a separate xml file)


  is invalid. ISO-8859-1 is a subset of UTF-8 and should cause no
problems
  since most parsers default to UTF-8 if the XML declaration is ommited.

All parsers default to utf8 in the absence of a declaration and
byte-order mark, that is specified by the XMl rec.
However ISO-8859-1 is not a subset of utf8, the first 127 (ASCII) slots
are the same but the upper half of latin 1 is encoded in ISO-8859-1
as single bytes, which will cause fatal errors if interpretted as utf8,
where those characters require two bytes in the encoding.

  I believe the only constraint when using
  the XML output method is that the result must be a general parsed
  entity.

yes, but a general parsed entity in ISO-8859-1 encoding must have an
encoding declaration to be well formed. XSLT does not distinguish
between an xml declaration and a text declaration, the only difference
is the standalone attribute anyway).


David

-- 
http://www.dcarlisle.demon.co.uk/matthew

________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread