Re: [xsl] Need to remove unusual character in source

Subject: Re: [xsl] Need to remove unusual character in source
From: Mario Madunic <hajduk@xxxxxxxx>
Date: Tue, 26 Sep 2006 16:43:07 -0700
Figured it out a little while ago.

Off Topic

I'm using ant
<replaceregexp match="[\cX]" replace="" flags="g">

Thanks to Abel and Michael for your input.

Mario

Quoting Abel Braaksma <abel.online@xxxxxxxxx>:

> Mario Madunic wrote:
> > the character is and its a control character
> >
> > 0x18 CAN
> >   
> 
> Unfortunately, that says it all. Control characters are not allowed in 
> UTF-8 and as a result, are not allowed in XML, when the encoding is 
> UTF-8 (making XML not well-formed)
> 
> > the error message I recieve is
> > SXXP0003: Error reported by XML parser: Illegal XML character:  &#x18;.
> >   
> 
> This is indeed illegal. The other day I accidentally used &#x08;, which 
> is also illegal (I had it mistaken for a tab character, x09, which *is* 
> legal) .
> 
> > I've tried using ANT to clean it out but with no luck using native2ascii
> or
> > escapeunicode
> >   
> 
> Won't help either. Escaping these characters will not help. But you are 
> on the right track: use a filter to remove this character, or replace it 
> with something useful. I use a filter to get Micrososft *.msg format, 
> which has some useful lines, but the rest are control characters and 
> other illegal data. Here's what it might look like when you'd resort to 
> using Ruby (you can call it from Ant if you like), see www.ruby-lang.org.
> 
> (spoiler warning: this is off-topic and only marginally related to xslt)
> 
> 
> # create working dir
> if not FileTest::exist?('trimmed')
>   Dir.mkdir('trimmed')
> end
> 
> Dir.entries(".").each do |fn|
>   if fn =~ /\.yourextension/
>     # open file and set it to binmode
>     file = File.new(fn)
>     file.binmode
>    
>     # read complete file contents and scan it
>     newfile = File.new("trimmed/#{fn}.txt", 'w')
>     file.gets(nil).scan(/[^\x18]+/m) do |found|
>       newfile.puts(found);
>     end
>   end
> end
> 
> 
> Just replace "yourextension" with the extension of your file and replace 
> "trimmed" with an output dirname of your choice. Replace '.txt" with 
> whatever extension you would like yourself. It runs through the current 
> directory and copies all files to the "trimmed" directory, with one 
> change: the x18 character is removed.
> 
> Of course, you can use Perl, a DOS Batch file (takes some practice), 
> Bash, VBScript, PHP, Grep, Awk or any other tool you'd prefer.
> 
> HTH,
> 
> Cheers,
> Abel Braaksma
> http://abelleba.metacarpus.com
> 
> 
> 
> > Can this be done or do I need to ask the client to remove it from their
> data,
> > which might not be an option?
> >
> > Any help or insight would be greatly appreciated.
> >
> > Marijan Madunic

Current Thread