Re: [xsl] Encoding problem or what else?

Subject: Re: [xsl] Encoding problem or what else?
From: Geert Josten <Geert.Josten@xxxxxxxxxxx>
Date: Thu, 08 Dec 2005 08:43:54 +0100
Geert,
this is interesting to know.
What do you mean by "patch"?
Do you mean perhaps that I should write something that strips out the 3 bytes from the beginning of the file?

Jup, I meant writing (Java?) code. But not in the sense of writing a separate app that bites ;) the three bytes off the head of the document, but merely adjusting the reading process in existing code. Provided you have access to it. When using the XSL parser in a larger framework (Cocoon perhaps), you can often do this fairly easy. When using the XSL parser from the command-line, typically not.


I think that the easiest solution is to ask the people who deliver this file to switch to ISO-8859-1 as there is no real need to use unicode for these files, I mean, there is not going to be any text containing exotic characters in there.

Jup, that is in line with my second suggestion. But perhaps they can use a different creation tool. This problem is most heared when people are editing XML documents with a text editor.


I am bound to use this xsl processor for the simple reason that it's the best of the bunch from a performance standpoint (thanks Micheal Kay!).
I've been struggling for days with Altova XSLT 2005 engine and Oracle's internal processor and it was a nightmare.
I had a file of 32Mb xml file that took *hours* to be processed with these two processors until I tried out saxon that cruched it in less than one minute!

Nice..


So, as you can easily guess, I am not going to willingly dump Saxon just for those three funny bytes.

No, you shouldn't. But perhaps someone knows a way to configure Saxon such that it uses a different XML parser front end?


Hey Micheal, what do you think about this?
Is there any hope that xerces will "consume" this utf-8 marker in the near future?

Cheers..


Current Thread