|
Subject: Re: [xsl] Need to remove unusual character in source From: Mario Madunic <hajduk@xxxxxxxx> Date: Tue, 26 Sep 2006 16:43:07 -0700 |
Figured it out a little while ago.
Off Topic
I'm using ant
<replaceregexp match="[\cX]" replace="" flags="g">
Thanks to Abel and Michael for your input.
Mario
Quoting Abel Braaksma <abel.online@xxxxxxxxx>:
> Mario Madunic wrote:
> > the character is and its a control character
> >
> > 0x18 CAN
> >
>
> Unfortunately, that says it all. Control characters are not allowed in
> UTF-8 and as a result, are not allowed in XML, when the encoding is
> UTF-8 (making XML not well-formed)
>
> > the error message I recieve is
> > SXXP0003: Error reported by XML parser: Illegal XML character: .
> >
>
> This is indeed illegal. The other day I accidentally used , which
> is also illegal (I had it mistaken for a tab character, x09, which *is*
> legal) .
>
> > I've tried using ANT to clean it out but with no luck using native2ascii
> or
> > escapeunicode
> >
>
> Won't help either. Escaping these characters will not help. But you are
> on the right track: use a filter to remove this character, or replace it
> with something useful. I use a filter to get Micrososft *.msg format,
> which has some useful lines, but the rest are control characters and
> other illegal data. Here's what it might look like when you'd resort to
> using Ruby (you can call it from Ant if you like), see www.ruby-lang.org.
>
> (spoiler warning: this is off-topic and only marginally related to xslt)
>
>
> # create working dir
> if not FileTest::exist?('trimmed')
> Dir.mkdir('trimmed')
> end
>
> Dir.entries(".").each do |fn|
> if fn =~ /\.yourextension/
> # open file and set it to binmode
> file = File.new(fn)
> file.binmode
>
> # read complete file contents and scan it
> newfile = File.new("trimmed/#{fn}.txt", 'w')
> file.gets(nil).scan(/[^\x18]+/m) do |found|
> newfile.puts(found);
> end
> end
> end
>
>
> Just replace "yourextension" with the extension of your file and replace
> "trimmed" with an output dirname of your choice. Replace '.txt" with
> whatever extension you would like yourself. It runs through the current
> directory and copies all files to the "trimmed" directory, with one
> change: the x18 character is removed.
>
> Of course, you can use Perl, a DOS Batch file (takes some practice),
> Bash, VBScript, PHP, Grep, Awk or any other tool you'd prefer.
>
> HTH,
>
> Cheers,
> Abel Braaksma
> http://abelleba.metacarpus.com
>
>
>
> > Can this be done or do I need to ask the client to remove it from their
> data,
> > which might not be an option?
> >
> > Any help or insight would be greatly appreciated.
> >
> > Marijan Madunic
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] Need to remove unusual ch, Abel Braaksma | Thread | Re: [xsl] Need to remove unusual ch, Colin Adams |
| Re: [xsl] Need to remove unusual ch, Abel Braaksma | Date | [xsl] Re: processing text nodes one, Dimitre Novatchev |
| Month |