RE: [xsl] resolve html entities

Subject: RE: [xsl] resolve html entities
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Mon, 31 Oct 2005 09:11:08 -0000
I would suggest parsing the HTML using John Cowan's TagSoup parser. This
looks to the XSLT processor just like an XML parser, so you can probably
integrate it directly - depending on the XSLT processor that you are using.

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: Maximilian Gdrber [mailto:max@xxxxxxxxxx]
> Sent: 31 October 2005 08:40
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] resolve html entities
>
> Hi,
>
> I know this is a common question but I could not find a
> specific answer
> to this:
>
> I am exporting texts from a database that contains html markup. Now I
> need to transform
> the html to something usable in a DTP application.
>
> The tags are not the problem because I am only allowing a
> subset of html
> but the html entities
> (german umlauts, special characters) would need to be transformed to
> plain Unicode (UTF-8)
> characters.
>
> What is the best way to achieve this?
>
> Thanks,
>
> Max Gaerber

Current Thread