Subject: Re: [xsl] resolve html entities From: Maximilian Gärber <max@xxxxxxxxxx> Date: Mon, 31 Oct 2005 10:42:44 +0100 |
Thanks, Max
I would suggest parsing the HTML using John Cowan's TagSoup parser. This looks to the XSLT processor just like an XML parser, so you can probably integrate it directly - depending on the XSLT processor that you are using.
Michael Kay
http://www.saxonica.com/
-----Original Message-----
From: Maximilian Gdrber [mailto:max@xxxxxxxxxx] Sent: 31 October 2005 08:40
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: [xsl] resolve html entities
Hi,
I know this is a common question but I could not find a specific answer to this:
I am exporting texts from a database that contains html markup. Now I need to transform
the html to something usable in a DTP application.
The tags are not the problem because I am only allowing a subset of html but the html entities
(german umlauts, special characters) would need to be transformed to plain Unicode (UTF-8)
characters.
What is the best way to achieve this?
Thanks,
Max Gaerber
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Acces nodes [help], Ana Gaspar Martínez | Thread | Re: [xsl] resolve html entities, David Carlisle |
Re: [xsl] Indented HTML Lists > Mul, Ragulf Pickaxe | Date | Re: [xsl] Acces nodes [help], Ana Gaspar Martínez |
Month |