Subject: Re: [xsl] Need to remove unusual character in source From: Mario Madunic <hajduk@xxxxxxxx> Date: Tue, 26 Sep 2006 16:43:07 -0700 |
Figured it out a little while ago. Off Topic I'm using ant <replaceregexp match="[\cX]" replace="" flags="g"> Thanks to Abel and Michael for your input. Mario Quoting Abel Braaksma <abel.online@xxxxxxxxx>: > Mario Madunic wrote: > > the character is and its a control character > > > > 0x18 CAN > > > > Unfortunately, that says it all. Control characters are not allowed in > UTF-8 and as a result, are not allowed in XML, when the encoding is > UTF-8 (making XML not well-formed) > > > the error message I recieve is > > SXXP0003: Error reported by XML parser: Illegal XML character: . > > > > This is indeed illegal. The other day I accidentally used , which > is also illegal (I had it mistaken for a tab character, x09, which *is* > legal) . > > > I've tried using ANT to clean it out but with no luck using native2ascii > or > > escapeunicode > > > > Won't help either. Escaping these characters will not help. But you are > on the right track: use a filter to remove this character, or replace it > with something useful. I use a filter to get Micrososft *.msg format, > which has some useful lines, but the rest are control characters and > other illegal data. Here's what it might look like when you'd resort to > using Ruby (you can call it from Ant if you like), see www.ruby-lang.org. > > (spoiler warning: this is off-topic and only marginally related to xslt) > > > # create working dir > if not FileTest::exist?('trimmed') > Dir.mkdir('trimmed') > end > > Dir.entries(".").each do |fn| > if fn =~ /\.yourextension/ > # open file and set it to binmode > file = File.new(fn) > file.binmode > > # read complete file contents and scan it > newfile = File.new("trimmed/#{fn}.txt", 'w') > file.gets(nil).scan(/[^\x18]+/m) do |found| > newfile.puts(found); > end > end > end > > > Just replace "yourextension" with the extension of your file and replace > "trimmed" with an output dirname of your choice. Replace '.txt" with > whatever extension you would like yourself. It runs through the current > directory and copies all files to the "trimmed" directory, with one > change: the x18 character is removed. > > Of course, you can use Perl, a DOS Batch file (takes some practice), > Bash, VBScript, PHP, Grep, Awk or any other tool you'd prefer. > > HTH, > > Cheers, > Abel Braaksma > http://abelleba.metacarpus.com > > > > > Can this be done or do I need to ask the client to remove it from their > data, > > which might not be an option? > > > > Any help or insight would be greatly appreciated. > > > > Marijan Madunic
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Need to remove unusual ch, Abel Braaksma | Thread | Re: [xsl] Need to remove unusual ch, Colin Adams |
Re: [xsl] Need to remove unusual ch, Abel Braaksma | Date | [xsl] Re: processing text nodes one, Dimitre Novatchev |
Month |