Re: [xsl] fault tolerant saxon:parse()

Subject: Re: [xsl] fault tolerant saxon:parse()
From: "Andrew Welch" <andrew.j.welch@xxxxxxxxx>
Date: Mon, 17 Nov 2008 11:58:39 +0000
2008/11/17 David Carlisle <davidc@xxxxxxxxx>:
>
>> I'm wondering if there's a standard approach for a fault tolerant
>> saxon:parse()   (or alternative equivalent)
>
> personally I've used tagsoup and htmplparse.xsl, but parhaps the nearest
> to a standard these days is http://about.validator.nu/ which implements
> the HTML5 parsing algorithm in Java and exposes (so I'm told) sax and
> DOM interfaces as if it were reading XML.

Thanks, but I'm looking more for a way of detecting when it's needed...

For example, in the nasty RSS feed for Transport for London's live
travel updates you can have:

<title> &lt;a href="/tfl/livetravelnews/realtime/tube/default.html"&gt;Today&lt;/a&gt;
</title>

and:
		
<title>Hammersmith &amp; City</title>

The former needs parsing if you want to process the escaped markup,
but if you do that with the latter you get an error (because it thinks
the ampersand is the start of an entity) - its the same element, so
both escaped and non-escaped markup needs to be handled.

Maybe saxon:try / catch is the only option here...?


-- 
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/

Current Thread