Re: [xsl] Safe-guarding codepoints-to-string() from wrong input

Subject: Re: [xsl] Safe-guarding codepoints-to-string() from wrong input
From: "Andrew Welch" <andrew.j.welch@xxxxxxxxx>
Date: Wed, 20 Dec 2006 15:08:12 +0000
On 12/20/06, Abel Braaksma <abel.online@xxxxxxxxx> wrote:
I know that control characters are not allowed and throw an "Invalid XML
character" error.

If you are receiving strings containing literal control characters then they're almost definitely encoded in Windows-1252 - just parse them using that and you'll be ok.

If the string contains control characters as character references,
then its a bit harder because the references get expanded using
unicode codepoints, and not those specified in the Windows-1252
mappings...  So you need to parse/serialize the string to expand the
references (I personally use JTidy with the CharEncoding set to
Configuration.RAW which forces the Tidy to output the bytes instead of
a reference)

Its a pain....

cheers
andrew

Current Thread