Re: [xsl] Transform from UTF-8 symbol to character entity

Subject: Re: [xsl] Transform from UTF-8 symbol to character entity
From: David Carlisle <davidc@xxxxxxxxx>
Date: Sat, 16 Jan 2010 10:27:37 GMT
If you use html output then (most likely) these will happen
automatically, for xml output then are you sure you want to do this (it
will make your output not well formed unless you also reference a DTD
that defines the entities)

I assume your input is not exactly as you show as you showed
ascii " being converted to both ldquo and rdquo and asci - being
converted to both ndash and mdash.

Assuming you are using xslt 2 the simplest way to do this is to use a
character map, you appear to be using the standard iso/html entity names
so I assume (despite the examples you gave) that you want the usual
definitions. A character map that does the right thing is avaiablable at

http://www.w3.org/2003/entities/2007/entitynamesmap.xsl


so you can use

<xsl:import
href="http://www.w3.org/2003/entities/2007/entitynamesmap.xsl"/>

<xsl:output use-character-maps="w3c-entity-names"/>

or better, take a local copy of the files in that directory and
reference the local copy.


However for most purposes it is better to use numeric chharacter
references rather than character names, in which case you just need to
specify an encoding that does not include these characters, and they
will be encoded as numeric references

<xsl:output encoding="US-ASCII"/>

for example.

David

Current Thread