Re: [xsl] Encoding issues with document() function

Subject: Re: [xsl] Encoding issues with document() function
From: Nic James Ferrier <nferrier@xxxxxxxxxxxxxxxxxxxx>
Date: Sat, 04 Nov 2006 09:41:35 +0000
"Pankaj Bishnoi" <pankaj.bishnoi@xxxxxxxxxxx> writes:

> Hi All
>         I am having a xsl in which i use XSLT document() function. The
> problem i am facing is that the xml file i am trying to read by using
> document() function is having some Unicode characters and the exception
> thrown at transformation time is ::
> SystemId Unknown; Line #133;Column #104; Can not load requested doc: An
> invalid XML character(Unicode: 0x2) was found in the element content of the
> document
> The source xml file is having encoding UTF-8. I tried to search the web for
> this issue and one alternate specified is to replace thos '0x2' character.
> Now there can be other characters as well that might come in other scenarios
> such as 0x1,0x13 etc. Now my quesstion is is there any encoding that
> supports all these characters?
> Is there any way out for this issue . Any help will be highly
> appreciated.

You don't mention what processor you're using...

But document() can only do the simplest thing which is to presume that
the entity it's been asked to read will be encoded correctly.

It sounds like it's very likely that your entity is NOT utf-8 encoded
correctly. It happens. Even with big websites (I spent ages debugging
O'Reilly's XML RSS feeds once because they were full of encoding

There are 2 alternatives:

1. ask the people who control the entity to fix the encoding.

2. write a new document() function which fixes arbitary encoding
   problems and make it available in your processor.

Nic Ferrier   for all your tapsell ferrier needs

Current Thread