Re: [xsl] output to iso-8859-1 of non-iso characters, what is required action

Subject: Re: [xsl] output to iso-8859-1 of non-iso characters, what is required action
From: "bryan rasmussen" <rasmussen.bryan@xxxxxxxxx>
Date: Wed, 7 May 2008 17:08:15 +0200
>  don't mix 'characters' with 'bytes'. iso-8859-1 is a codepage that assigns
> a number of characters to certain bytes in the range of 0255.
>
>  In XML a character may be displayed in different ways, all perfectly
legal:
> A, &#65; &#x41;
>

yes, was there anything in the question that implied otherwise?

>  I seem to remember that it is totally up to the processor to select a
> method. If you use Saxon there are special options to control that
behaviour
> (if you prefer native bytes, decimal or hex entities).
>
ok. But by reading the spec it seems to me that if you don't specify a
method it has to do it automatically for you in the case of outputting
text nodes in an XML document (personally I think it should do the
same in comment nodes - not sure why it was decided not to), but to
always fail on a text output.

>  Dropping characters is never an option.
why not. If I want to go from UTF-8 to ISO 8859-1 for some reason the
low level way would be to write something that went through every byte
and checked if it was in range and if not remove it. In the case of a
text output from XML it would be nice if  by declaring the output in
my stylesheet that this was the behavior. But it isn't so on text
output using XSL 1 isn't useful because translate a poor solution for
something that a declarative solution should handle well.

I declare I have something of encoding x and I want something of
encoding y, if I also declare an XML output is required the processor
finds a solution for me. If I declare a text output it seems to think
there is no possible solution. whereas the common solution is to
remove what isn't allowed
replace what isn't allowed.
I think in that context fail doesn't seem very good.





> If you want that you could easily
> filter using translate() to remove all unwanted characters from text nodes.
>

given that translate (in XSL 1) of all non iso-8859-1 characters to an
empty string is easy do you think you could send me one? :)




Cheers,
Bryan Rasmussen

Current Thread