[xsl] up conversion using regex (was something else)

Subject: [xsl] up conversion using regex (was something else)
From: David Carlisle <davidc@xxxxxxxxx>
Date: Mon, 16 Aug 2004 11:03:46 +0100
Wendell wrote

> Well up-conversion has been going on along as e-text has. But XML is only 
> now beginning to catch up to where Omnimark and even Perl were years ago. 

yes the XSLT2 functionality in this area is going to prove very useful I
think.

I wrote some kind of stream of consciousness thoughts on requirements
in this area a couple of years back:
http://lists.w3.org/Archives/Public/xsl-editors/2002JanMar/0083.html

The exact syntax that the WG came up with is rather different to the
suggestions in that mail, but I believe that it can cope with most if
not all of the requirements. For example one of the harder cases that I
floated was

  RE-6c: HTML Markup.
    As above but with HTML, in particular with implied end tags. In general
    this requires a DTD and knowledge of SGML omitted tag rules. To handle
    general HTML as it appears in the wild, arbitrarily complicated "tag
    soup" parsing heuristics as implemented in the browsers would be
    needed. However this appears to be a very common requirement often
    generated by storing HTML fragments as strings in a database. One may
    hope that specific simple cases may be handled for example:



The regexp support in XSLT2/saxon8 is enough to do this: see for example
http://www.dcarlisle.demon.co.uk/htmlparse.xsl
which does the above (parsing dubiously formed html, together with some
support for embedded xml_+namespace syntax). This provides an
alternative for the FAQ question on what to do with
<foo><![CDATA[...<a href="#x">click here</a> ...]]><foo>
as you can now handle that quite reasonably with XSLT2.

David

________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

Current Thread