[xsl] possible workarounds to process files with invalid character encoding ...

Subject: [xsl] possible workarounds to process files with invalid character encoding ...
From: Matthias Einbrodt <matthias.einbrodt@xxxxxxxxxxxxx>
Date: Fri, 12 Dec 2008 22:13:44 +0100
Hello,

I'm trying to transform a textfile with xslt using the unparsed-text and
tokenize functions. Unfortunately the text file consists of characters
which are encoded with a non Unicode compliant encoding scheme. So as
expected my Saxon Processor (version 9.1.0.3 Basic) shows me a
*MalformedInputException *when I want to parse the file.

Now my question is if there are any "workarounds" to make Saxon process
the file anyway. Maybe by:

(1) Writing a sort of plugin that let's Saxon support also non Unicode
compliant encodings;

(2) By adding in some way Metadata to the input file which Saxon or
another XSLT Parser can handle and that specifies a mapping of the used
character encodings to the appropriate code points of a Unicode
compliant encoding.

And if there exists such a workaround is it even worth trying to
implement it or would someone be better of preprocessing the file with a
custom Java-Program or by even trying to modify the program that creates
such text-files in such a way that it uses a Unicode-compliant encoding
scheme rather than it's own custom one?

What are your opinions?

Best Regard

Matthias Einbrodt

Current Thread