Subject: Re: [xsl] fault tolerant saxon:parse() From: "Andrew Welch" <andrew.j.welch@xxxxxxxxx> Date: Mon, 17 Nov 2008 11:58:39 +0000 |
2008/11/17 David Carlisle <davidc@xxxxxxxxx>: > >> I'm wondering if there's a standard approach for a fault tolerant >> saxon:parse() (or alternative equivalent) > > personally I've used tagsoup and htmplparse.xsl, but parhaps the nearest > to a standard these days is http://about.validator.nu/ which implements > the HTML5 parsing algorithm in Java and exposes (so I'm told) sax and > DOM interfaces as if it were reading XML. Thanks, but I'm looking more for a way of detecting when it's needed... For example, in the nasty RSS feed for Transport for London's live travel updates you can have: <title> <a href="/tfl/livetravelnews/realtime/tube/default.html">Today</a> </title> and: <title>Hammersmith & City</title> The former needs parsing if you want to process the escaped markup, but if you do that with the latter you get an error (because it thinks the ampersand is the start of an entity) - its the same element, so both escaped and non-escaped markup needs to be handled. Maybe saxon:try / catch is the only option here...? -- Andrew Welch http://andrewjwelch.com Kernow: http://kernowforsaxon.sf.net/
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] fault tolerant saxon:pars, David Carlisle | Thread | Re: [xsl] fault tolerant saxon:pars, David Carlisle |
Re: [xsl] count following-sibling n, Vasu Chakkera | Date | Re: [xsl] fault tolerant saxon:pars, David Carlisle |
Month |