|
Subject: Re: [xsl] Tags in text content From: "Norm Tovey-Walsh ndw@xxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Thu, 11 Apr 2024 08:33:39 -0000 |
> I have some xml where there are btagsb embedded in the text content.
As Martin suggested, parse-xml-fragment() will do the job as long as all of
the escaped markup is well-formed. If whatbs been escaped is HTML that may
have come from a process that produced not-well-formed markup, for example:
<root>
<p><em>End tags? We donbt need no stinking end tags![1]</p>
</root>
then the problem is a little harder. What Ibve found successful in this case
is the Validator.nu HTML parser. It will parse any input and produce a
well-formed document that conforms to the parsing rules of HTML5. Some
post-processing may be necessary to tidy it back up again (removing the HTML5
namespace, removing the <html> wrapper, etc. but thatbs *much* easier once
you have an XML fragment).
Be seeing you,
norm
[1] https://en.wikipedia.org/wiki/Stinking_badges
--
Norm Tovey-Walsh <ndw@xxxxxxxxxx>
https://norm.tovey-walsh.com/
> There is a great difference between seeking how to raise a laugh from
> everything, and seeking in everything what may justly be laughed
> at.--Lord Shaftesbury
[demime 1.01d removed an attachment of type application/pgp-signature which had a name of signature.asc]
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] Tags in text content, rick@xxxxxxxxxxxxxx | Thread | Re: [xsl] Tags in text content, Pieter Masereeuw pie |
| Re: [xsl] Tags in text content, rick@xxxxxxxxxxxxxx | Date | Re: [xsl] Tags in text content, Pieter Masereeuw pie |
| Month |