Re: [xsl] Encoding problem or what else?

Subject: Re: [xsl] Encoding problem or what else?
From: "FC" <flavio@xxxxxx>
Date: Wed, 7 Dec 2005 23:25:01 +0100
----- Original Message ----- From: "Geert Josten" <Geert.Josten@xxxxxxxxxxx>
To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Wednesday, December 07, 2005 7:22 PM
Subject: Re: [xsl] Encoding problem or what else?

Hi Flavio,

I expected this from your first post. The three bytes are the (optional) UTF-8 Byte Order Mark (BOM). The XML Parser that is used by your XSL processor does not consume them as it should, resulting in character data in the prolog, which is obviously not allowed.

It is typical of Microsoft products to use this BOM. Wordpad adds it at save time and consumes it at reading time, so you will never see it in that editor. Switch to a different (XML) parser, get rid of the BOM in your data (can you influence the creation?) or patch the reading process to consume this BOM.

Second option is perhaps easiest.


this is interesting to know.
What do you mean by "patch"?
Do you mean perhaps that I should write something that strips out the 3 bytes from the beginning of the file?

I think that the easiest solution is to ask the people who deliver this file to switch to ISO-8859-1 as there is no real need to use unicode for these files, I mean, there is not going to be any text containing exotic characters in there.

I am bound to use this xsl processor for the simple reason that it's the best of the bunch from a performance standpoint (thanks Micheal Kay!).
I've been struggling for days with Altova XSLT 2005 engine and Oracle's internal processor and it was a nightmare.
I had a file of 32Mb xml file that took *hours* to be processed with these two processors until I tried out saxon that cruched it in less than one minute!
So, as you can easily guess, I am not going to willingly dump Saxon just for those three funny bytes.

Hey Micheal, what do you think about this?
Is there any hope that xerces will "consume" this utf-8 marker in the near future?


Current Thread