Subject: RE: [xsl] unparsed-text() and illegal characters From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Thu, 27 Jul 2006 20:21:40 +0100 |
The spec is very strict that characters not allowed in XML cause an error. This is a change since the book was written. However, the spec is very loose about how URIs are resolved. So a conformant product could take the URI thing.txt?substitute-illegal-chars=FFFD as a reference to "the document formed by taking thing.txt and substituting illegal characters with xFFFD." Perhaps I'll do that. Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: Abel Braaksma Online [mailto:abel.online@xxxxxxxxx] > Sent: 27 July 2006 19:10 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: [xsl] unparsed-text() and illegal characters > > Dear List, > > Trying to "import" a non-XML file of an undefined encoding, I > received the following error when using Saxon8: "The unparsed > text file contains a character illegal in XML (line=1 > column=4 value=hex 11)". I only found one reference about > this error > (http://www.stylusstudio.com/xsllist/200510/post90470.html), > which is actually a post about illegal characters inside the > XSLT document. > > Michael Kay points out in that post that this error is merged > into XTDE1190 (see > http://www.w3.org/TR/xslt20/#err-XTDE1190). It is claimed in > the specs that non-understood characters or byte sequences > should result in this non-recoverable dynamic error. > > In his indispensable book, the XSLT 2.0 Programmer's > Reference, he states the following: > "Some processors will provide configuration options that pass > this choice on the user. If the file contains characters that > are invalid in XML (this applies to most control characters > in the range x00 to x1F under XML 1.0, but only to the null > character x00 under XML 1.1) then the invalid characters are > substituted by the special Unicode character xFFFD, which is > specifically intended for such purposes." > > I understand that the book was written before XSLT 2.0 was > finalized (it is still a Candidate), but I wonder if a > treatment like above is still possible somehow. The contents > of the file is ISO-8859-1, apart from the start and end > header, which contain control characters. I only need the > part that is parsable as text, the rest can be dismissed. > > Am I asking too much from XSLT, or is this somehow possible? > It would really add to the possibilities, and it means I > don't need some extra filter or preparse step. > > Cheers, > Abel Braaksma > www.nuntia.nl
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] unparsed-text() and illegal c, Abel Braaksma Online | Thread | Re: [xsl] unparsed-text() and illeg, Abel Braaksma Online |
Re: [xsl] Filemaker XSL woes, Chad Chelius | Date | Re: [xsl] unparsed-text() and illeg, Abel Braaksma Online |
Month |