Subject: [xsl] cleaning up ill-structured html From: Jim_Albright@xxxxxxxxxxxx Date: Fri, 24 Jan 2003 13:41:10 -0500 |
with this input <p>Some <i>stuff</i> that should be cleaned.<br/> More <b>stuff.</b> <p> Yet more.<br> </p> Stuff. </p> I have this XML output that you can clean up with XSLT <sample> <p>Some <emphasis>stuff</emphasis> that should be cleaned.</p> <paragraph>More <strong>stuff.</strong></paragraph> <p>Yet more.</p> <paragraph>Stuff.</paragraph> </sample> Using this XML control file: <?xml version="1.0" encoding="ISO-8859-1" ?> <!DOCTYPE convert2xml SYSTEM "c:\d\xml\convert2xml.dtd" > <!-- file: HTML-cleanup.ctl Purpose: Control file for c2x program Author: jaa Date: 20020124 Clean up dirty HTML and make it into good XML --> <convert2xml> <root-element name="sample"> </root-element> <recognize-element name="paragraph"> <start-token> <pattern>\pp</pattern> <before>
</before> </start-token> <end-token> <pattern>
</p></pattern> </end-token> <allowed-child ref="emphasis"/> <allowed-child ref="strong"/> </recognize-element> <recognize-element name="p"> <start-token> <pattern><p>
</pattern> <before>
</before> </start-token> <start-token> <pattern><p></pattern> <before>
</before> </start-token> <end-token> <pattern></p></pattern> </end-token> <end-token> <pattern><b>
</p></pattern> </end-token> <end-token> <pattern><br/>
</pattern> <parsed-after>\pp</parsed-after> </end-token> <end-token> <pattern><br/>
</p></pattern> <parsed-after>\pp</parsed-after> </end-token> <end-token> <pattern><br>
</p>
</pattern> <parsed-after>\pp</parsed-after> </end-token> <end-token> <pattern><br/></pattern> <parsed-after>\pp</parsed-after> </end-token> <end-token> <pattern><br></pattern> </end-token> <end-token> <pattern>
</p></pattern> </end-token> <allowed-child ref="emphasis"/> <allowed-child ref="strong"/> </recognize-element> <recognize-element name="emphasis"> <start-token> <pattern><i></pattern> </start-token> <end-token> <pattern></i></pattern> </end-token> <end-token> <pattern></i>
</pattern> <after> </after> </end-token> </recognize-element> <recognize-element name="strong"> <start-token> <pattern><b></pattern> </start-token> <end-token> <pattern></b></pattern> </end-token> <end-token> <pattern></b>
</pattern> </end-token> </recognize-element> </convert2xml> In a free program called C2X -- convert to XML. Ask me off list if you want more info as C2X is off topic. Date: Thu, 23 Jan 2003 21:54:43 +0100 From: Ole Sandum <osandum@xxxxxxxxxxx> Subject: [xsl] cleaning up ill-structured html Example: <p>Some <i>stuff</i> that should be cleaned.<br/> More <b>stuff.</b> <p> Yet more.<br> </p> Stuff. </p> Should become: <p>Some <i>stuff</i> that should be cleaned.</p> <p>More <b>stuff.</b></p> <p>Yet more.</p> <p>Stuff.</p> XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] cleaning up ill-structure, David Carlisle | Thread | [xsl] OT: XForms, Bernd Gauweiler |
RE: [xsl] Caching document in brows, cknell | Date | Re: [xsl] RE: [announce] New Visual, W. Eliot Kimber |
Month |