Re: [xsl] 0x19 is not a legal XML character

Andrew Welch wrote:

On 6/28/07, Abel Braaksma <abel.online@xxxxxxxxx> wrote:

this may work and will remove all offending U+0019 chars.
The "offending" u+0019 characters could well be good content that's
being written/read in the wrong encoding.

True, but if I remember correctly, then all ISO-646 characters (the ancient ASCII ones, before 0x80) are written as is in UTF-8, all ISO-8859-x, CPxxx windows/dos encodings, TIS-620, Shift-JIS, GB2312 etc. The only notable exceptions are, I believe, the IBM EBCDIC encodings (but IBM500 is most often used, which has the End Of Medium right at 0x19 as well). None of these encodings, not even the EBCDIC ones, use the 0x19 for a diacritic.

Just trying to state that: I think it is very unlikely that encoding alone (read or write) will be the culprit here (which is often a culprit though for higher characters).

Of course, it can be valid content, in which case the XML documents should be opened as XML 1.1 documents.


Simply stripping them out probably isn't the best approach - you need
to work out why they're there, what put them there and then fix that.
Patching it up afterwards is never a good idea.

agreed, just wanted to show how it can be done in XSLT, if you (the OP) felt a need for it.


Imagine explaining your process to someone else in a years time -
"this step is where we remove the u+0019 characters".

:D :D
Good design starts at the sources.

cheers,
Abel

Current Thread
Re: [xsl] 0x19 is not a legal XML character, (continued) Abel Braaksma - Thu, 28 Jun 2007 11:56:56 +0200 Mulberry Technologies List Owner - Thu, 28 Jun 2007 06:07:21 -0400 Abel Braaksma - Thu, 28 Jun 2007 12:21:00 +0200 Andrew Welch - Thu, 28 Jun 2007 11:33:32 +0100 Abel Braaksma - Thu, 28 Jun 2007 12:46:10 +0200 <= Michael Kay - Thu, 28 Jun 2007 12:06:44 +0100 Abel Braaksma - Thu, 28 Jun 2007 13:28:28 +0200

Current Thread

Re: [xsl] 0x19 is not a legal XML character, (continued)
- Abel Braaksma - Thu, 28 Jun 2007 11:56:56 +0200
- Mulberry Technologies List Owner - Thu, 28 Jun 2007 06:07:21 -0400
  - Abel Braaksma - Thu, 28 Jun 2007 12:21:00 +0200
    - Andrew Welch - Thu, 28 Jun 2007 11:33:32 +0100
    - Abel Braaksma - Thu, 28 Jun 2007 12:46:10 +0200 <=
    - Michael Kay - Thu, 28 Jun 2007 12:06:44 +0100
    - Abel Braaksma - Thu, 28 Jun 2007 13:28:28 +0200

<- Previous	Index	Next ->
Re: [xsl] 0x19 is not a legal XML c, Andrew Welch	Thread	RE: [xsl] 0x19 is not a legal XML c, Michael Kay
Re: [xsl] 0x19 is not a legal XML c, Andrew Welch	Date	RE: [xsl] 0x19 is not a legal XML c, Michael Kay
	Month

<-prev [Thread] next->	<-prev [Date] next->
Month Index \| List Home