RE: [xsl] possible workarounds to process files with invalid character encoding ...

Subject: RE: [xsl] possible workarounds to process files with invalid character encoding ...
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Fri, 12 Dec 2008 21:26:38 -0000
If you're capable of writing a Java Reader that will process this file into
a stream of characters, then you can get Saxon to use this Reader by
nominating a custom UnparsedTextURIResolver.

Alternatively, I suspect you can do it at the Java level by registering an
encoding name for the encoding and associating it with a decoder for that
encoding - but I'm not familiar with the details.

Michael Kay
http://www.saxonica.com/ 

> -----Original Message-----
> From: Matthias Einbrodt [mailto:matthias.einbrodt@xxxxxxxxxxxxx] 
> Sent: 12 December 2008 21:14
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] possible workarounds to process files with 
> invalid character encoding ...
> 
> Hello,
> 
> I'm trying to transform a textfile with xslt using the 
> unparsed-text and tokenize functions. Unfortunately the text 
> file consists of characters which are encoded with a non 
> Unicode compliant encoding scheme. So as expected my Saxon 
> Processor (version 9.1.0.3 Basic) shows me a 
> *MalformedInputException *when I want to parse the file.
> 
> Now my question is if there are any "workarounds" to make 
> Saxon process the file anyway. Maybe by:
> 
> (1) Writing a sort of plugin that let's Saxon support also 
> non Unicode compliant encodings;
> 
> (2) By adding in some way Metadata to the input file which 
> Saxon or another XSLT Parser can handle and that specifies a 
> mapping of the used character encodings to the appropriate 
> code points of a Unicode compliant encoding.
> 
> And if there exists such a workaround is it even worth trying 
> to implement it or would someone be better of preprocessing 
> the file with a custom Java-Program or by even trying to 
> modify the program that creates such text-files in such a way 
> that it uses a Unicode-compliant encoding scheme rather than 
> it's own custom one?
> 
> What are your opinions?
> 
> Best Regard
> 
> Matthias Einbrodt

Current Thread