RE: [xsl] recognize character entities

Subject: RE: [xsl] recognize character entities
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Tue, 29 Aug 2006 15:00:16 +0100
> is there a way to recognize and filter only elements which
> text() begins with a character entity?
>   <Element>&Amp;This is filtered</Element>
>   <Element>&epsilon;This filtered</Element>
>   <Element>&euro;This is filtered</Element>
>   <Element>This is *not* filtered</Element>
>   <Element>And this also not</Element>

Technically these are all "entity references", not "character entities". If
you wrote "&#x20ac;", that would be a "character reference".

You can't detect these in XSLT, because the XML parser expands the character
entity before the XSLT processor gets to see it. If you really need to
distinguish a Euro sign written as &euro; from one written as a real Euro
character (from one written as &#x20ac;, if that's the right code), then you
need to preprocess the XML to flag these so they survive the journey through
the XML parser. For example, you could use a Perl script that replaces
&euro; by <?ent euro?>.

But this is against the spirit of XML: the entity reference is supposed to
be treated by the receiving application in exactly the same way as its
expansion would be treated.

Michael Kay

> in a template match like
>    <xsl:template match="Element[starts-with(text(),
> 'recognize_a_character_entity_here')]">
>        <NewElement>
>            <xsl:apply-templates select="@* | node()"/>
>        </NewElement>
>    </xsl:template>
> for mathematical xml-files we have a lot (around 2000)
> character entities to recognize and no chance to select them
> individually.
> thanks in advance
> for any help
> frank
> (thanks to mukul, abel and michael for answering my
> apostrophe/ quotation mark question)
> --
> Frank Marent
> emnemics ag
> Jungholzstrasse 43
> CH-8050 Z|rich
> Tel   +41 44 307 32 71
> Fax   +41 44 307 32 75
> Mail  frank.marent@xxxxxxxxxxx
> Skype frank.marent
> Ein Unternehmen der Kalaidos Bildungsgruppe Schweiz

Current Thread