more encoded questions

Subject: more encoded questions
From: Josef Vosyka <Josef.Vosyka@xxxxxxxxxxxxxxxxxxx>
Date: Mon, 06 Nov 2000 19:01:31 -0800
Hi,

Characters are being rendered according to
        a) input encoding
        b) input form (escaped/non-escaped)
        c) output encoding
        d) output method/version (for example html/4.0).

Lets assume an input document in encoding ISO-8859-1 and output method/version
HTML40.


Then these are the possible possibilities:

- all characters which are recognized as valid ones (for given output
encoding/method/version) are being rendered as symbolic reference:

        &#174;  ->      &reg;


- all characters which are not recognized as valid ones are being rendered as
numerical reference:

        &#777;  ->      &#777;

(because 777 is not valid HTML40 character (according to:
http://www.w3.org/TR/html4/sgml/entities.html))


- all characters which are valid, but do not have symbolic reference are being
rendered as character of given value.



Finally the questions:

1) is it entirely true what I just stated above?

2) &#151; is rendered as character of value 151. Why is it? My assumption is
that 151 is invalid char in ISO-8859-1 (the input encoding) and is also not
valid HTML40 char and therefore it should be rendered as numeric entity &#151;

3) and in contradictory, &#8482; is valid HTML40 char and should therefore be
rendered as &trade; but is being rendered as &#8482;

4) is the output encoding relevant if the character valid range is specifyed by
HTML40 specification (http://www.w3.org/TR/html4/sgml/entities.html) ?


This is the used stylesheet:

--------------XSL--------------
<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="1.0">
<xsl:output encoding="ISO-8859-1" method="html" version="4.0" />

<xsl:template match="/">
 <xsl:copy-of select="." />
</xsl:template>

</xsl:stylesheet>
--------------/XSL-------------


XML data:

--------------XML--------------
<?xml version="1.0" encoding="ISO-8859-1"?>
<html><body>
&#174;
&#151;
&#777;
&#8482;
</body></html>
--------------/XML-------------



I also include my previous email if someone feels like helping me with that
:-)


Thanks a lot,
--Josef


================================================================================

Hi,

I wonder whether someone can make clear some of the following encoding magics:

Let's assume this data:

--------------XML--------------
<?xml version="1.0"?>
<html><body>
&#174;
&#169;
&#153;
&#150;
&#151;
&#8482;
</body></html>
--------------/XML-------------

rendered with this stylesheet:

--------------XSL--------------
<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="1.0">
<xsl:output encoding="ISO-8859-1" method="html"/>

<xsl:template match="/">
 <xsl:copy-of select="." />
</xsl:template>

</xsl:stylesheet>
--------------/XSL-------------

produces this output:

--------------OUT--------------
<html><body>
&reg;
&copy;
~Y
~V
~W
&#8482;
</body></html>
--------------/OUT-------------

Which is rendered on linux based netscape as:

--------------OUT--------------
® © ? - - [tm]
--------------/OUT-------------


It is well known that XML knows only 5 entities (gt lt amp quot apos). If one
wants to use other ones (like copy reg trade ...) in his XML, he needs to
declare them in DOCTYPE declaration.

I assume that only 2 entities got rendered in my example as symbols ("reg" and
"copy") because they are the only ones from ISO-8859-1.


1) What do I have to do to render &#153; as &#153; (not as "~Y")?
2) Why the unicode value stayed as &#8482; (is it because ISO-8859-1 is not
unicode and therefore &#8482; is the only way to represent unicode char?)
3) Is &#153; currently the best way to interpret TM sign (if needed as one
char)?
4) is there commonly used DTD specifying all symbol entities, say from
ISO-8859-1?
5) is there commonly used DTD specifying frequently used symbols (like copy reg
...) being encoding independent (that is stupid question, the meaning is "is
there a simple way one usualy do not get in troubles with encoding?")
6) the only encoding I found "mdash" in is "ISO 8879:1986". What do I have to
specify in my XML if I want to use &mdash; ?
6b) Also what do I have to do if I want &mdash; to be product of my
transformation (I assume certain encoding parameter in "<xsl:output>".



My understanding is that "encoding" has no relation with "symbol" of the entity
(the symbol is not defined by the encoding definition). It only determines the
"value" of the char the "symbol" is being converted in.

Dispite this understanding I see one relation in my example: "reg" and "copy"
has been rendered as symbols, therefore the rendering engine has to know that
&#169; should be "copy".

7) Why?


I would really appreciate if someone knows some answers or at least point me to
some good references.

Thank you,
--Josef


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread