Re: [xsl] Tags in text content

For people that like to use XProc, in situations like this, XProc's <p:unescape-markup content-type="text/html"/> is my friend.

It may generate some extra HTML elements that you don't want. I remove them with <p:unwrap match="html:html | html:head | html:body" xmlns:html="http://www.w3.org/1999/xhtml"/>

Pieter

On 4/11/24 10:34, Norm Tovey-Walsh ndw@xxxxxxxxxx wrote:

I have some xml where there are btagsb embedded in the text content.
As Martin suggested, parse-xml-fragment() will do the job as long as all of the escaped markup is well-formed. If whatbs been escaped is HTML that may have come from a process that produced not-well-formed markup, for example:
<root>
     <p>&lt;em&gt;End tags? We donbt need no stinking end tags![1]</p>
</root>
then the problem is a little harder. What Ibve found successful in this case is the Validator.nu HTML parser. It will parse any input and produce a well-formed document that conforms to the parsing rules of HTML5. Some post-processing may be necessary to tidy it back up again (removing the HTML5 namespace, removing the <html> wrapper, etc. but thatbs *much* easier once you have an XML fragment).
                                         Be seeing you,
                                           norm
[1] https://en.wikipedia.org/wiki/Stinking_badges
--
Norm Tovey-Walsh <ndw@xxxxxxxxxx>
https://norm.tovey-walsh.com/
There is a great difference between seeking how to raise a laugh from
everything, and seeking in everything what may justly be laughed
at.--Lord Shaftesbury

<- Previous	Index	Next ->
Re: [xsl] Tags in text content, Norm Tovey-Walsh ndw	Thread	[xsl] evaluation of predicate using, Martin Honnen martin
Re: [xsl] Tags in text content, Norm Tovey-Walsh ndw	Date	[xsl] evaluation of predicate using, Martin Honnen martin
	Month

<-prev [Thread] next->	<-prev [Date] next->
Month Index \| List Home