Re: [xsl] Wrong encoding value: "Content is not allowed in prolog." ?

Subject: Re: [xsl] Wrong encoding value: "Content is not allowed in prolog." ?
From: Michael Kay <mike@xxxxxxxxxxxx>
Date: Thu, 29 Jul 2010 09:42:30 +0100
On 28/07/2010 15:39, Ben Stover wrote:
Assume I have an XML doc file which starts with:

<?xml version="1.0" encoding="UTF-16"?>
<foobar>....

But the xml doc file is NOT UTF-16 encoded but ANSI or ISO-8859-1 or whatever.
Does it matter?

I mean does an XSLT processor like Saxon (or other) view this as nice to have info but rely on the real encoding?


XSLT processors don't care. They pass off the work to an XML parser. Which is why, when a failure occurs, Saxon is careful to tell you that the error comes from the XML parser, not from Saxon itself.
Error on line 1 column 40 of in.xml:
   SXXP0003: Error reported by XML parser: Content is not allowed in prolog.
Transformation failed: Run-time errors were reported

That's another way of saying: you can choose from a wide range of parsers to run with Saxon, and if you choose one that has poor error messages, that's your problem not mine. (The one I generally recommend is the Xerces parser from Apache, but the one that most people use is the Xerces-derivative contained in the Sun/Oracle JDK; Sun's main contribution was to add bugs.)

In practice "Content not allowed in prolog" is a very generic way of reporting that the parser can't make sense of the bytes at the start of the file, and an incorrect encoding is one possible reason for that failure.

Michael Kay
Saxonica

Current Thread