Re: [xsl] Encoding issues with document() function

Subject: Re: [xsl] Encoding issues with document() function
From: "Joe Fawcett" <joefawcett@xxxxxxxxxxx>
Date: Sat, 04 Nov 2006 12:30:05 +0000
It doesn't matter about the encoding. XML cannot have 0xb, 0xc, 0xe and 0xf in it.
You can base64encode the data if it's part of an element's content before passing it to the XML parser, or replace the characters with allowed ones and then post process the data later to re-insert.


Joe


From: "Pankaj Bishnoi" <pankaj.bishnoi@xxxxxxxxxxx>
Reply-To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [xsl] Encoding issues with document() function
Date: Sat, 4 Nov 2006 17:53:11 +0530

Thanks for your help michael. Now i am replacing unicode characters.

I have the encoding UTF-8 now::

for 0x2 i can use replace('\u0002','')

but for following characers what will be the replace character::

0xa,0xb,0xc,0xd,0xe,0xf


Thanks Pankaj

----- Original Message -----
From: "Michael Kay" <mike@xxxxxxxxxxxx>
To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Saturday, November 04, 2006 3:08 PM
Subject: RE: [xsl] Encoding issues with document() function


> If the document really does contain the Unicode character with codepoint
> 0x02, then it's not a well-formed XML document, and you won't be able to
> read it from XSLT or from anything else that's designed to process XML.
You
> need to correct the program that created the document so that it outputs
> well-formed XML.
>
> The other possibility is that the document contains some other character
> which is being misread as codepoint 0x02 because the parser is using the
> wrong encoding, for example because the XML declaration is incorrect.
>
> Michael Kay
> http://www.saxonica.com/
>
> > -----Original Message-----
> > From: Pankaj Bishnoi [mailto:pankaj.bishnoi@xxxxxxxxxxx]
> > Sent: 04 November 2006 09:24
> > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> > Subject: [xsl] Encoding issues with document() function
> >
> > Hi All
> > I am having a xsl in which i use XSLT document()
> > function. The problem i am facing is that the xml file i am
> > trying to read by using
> > document() function is having some Unicode characters and the
> > exception thrown at transformation time is ::
> >
> > SystemId Unknown; Line #133;Column #104; Can not load
> > requested doc: An invalid XML character(Unicode: 0x2) was
> > found in the element content of the document
> >
> > The source xml file is having encoding UTF-8. I tried to
> > search the web for this issue and one alternate specified is
> > to replace thos '0x2' character.
> > Now there can be other characters as well that might come in
> > other scenarios such as 0x1,0x13 etc. Now my quesstion is is
> > there any encoding that supports all these characters?
> >
> > Is there any way out for this issue . Any help will be highly
> > appreciated.
> >
> > Thanks
> > Pankaj

Current Thread