RE: Multiple pages of well formed HTML ---> XML

Subject: RE: Multiple pages of well formed HTML ---> XML
From: "Maxime Levesque" <maximel@xxxxxxxxxxxxxx>
Date: Tue, 3 Aug 1999 11:20:44 -0700
 As a *workaround* for the unimplemented document() function,
you could implement a 'composite parser' (or agregating parser ...)
that would callback it's org.xml.sax.DocumentHandler to make
it think that it's handling a single document ...

 That will work if you are using a SAX based XSL processor (ex.: XT),
if it's DOM based, you can just 'glue' the trees together ...


public class CompositeParser
    implements org.xml.sax.DocumentHandler, org.xml.sax.Parser {

    private InputSources[] inputSources_;

    private DocumentHandler documentHandler_;

    private org.xml.sax.Parser aRealParser_ = "... your favorite parser
...";

    public void setDocumentHandler(DocumentHandler handler) {
	 documentHandler_ = handler;
    }

    public CompositeParser(InputSource[] inputSources) {
	inputSources_ = inputSources;
    }

    public void parse(InputSource source) throws SAXException,
java.io.IOException {

	 // ignore source ...

       documentHandler_.startDocument(); // fake the start of the
'aggregated' doc,

	 // fake a root start
       documentHandler_.startElement("YourFakeRoot", new
AttributeListImpl());

	 // receive the callbacks from all the
       // inputSources_ :

	 for(int i = 0; i < inputSources_.length; i++) {
	   aRealParser_.setDocumentHandler(this);
	   aRealParser_.parse(inputSources_[i]);
       }

	 // fake a root end
	 documentHandler_.endElement("YourFakeRoot");

       documentHandler_.endDocument(); // fake the end of the 'aggregated'
doc.
    }

    public void startElement(String name, AttributeList atts) throws
SAXException {
	 documentHandler_.startElement(name, atts);
    }

    public void endElement(String name) throws SAXException {
	 documentHandler_.startElement(name);
    }

    public void characters(char[] ch, int start, int length) throws
SAXException {
	 documentHandler_.characters(ch, start, length);
    }

    public void ignorableWhitespace(char[] ch, int start, int length) throws
SAXException {
	 documentHandler_.ignorableWhitespace(ch, start, length);
    }

    public void processingInstruction(String target, String data) throws
SAXException {
	 documentHandler_.(target, data);
    }

    public void startDocument() throws SAXException {} // silence this calls

    public void endDocument() throws SAXException {} // silence this calls

    //.... implement the other methods of org.xml.sax.Parser with empty
methods ....
    //... or delegate them to 'aRealParser_' ...
}


 Maxime Levesque


> -----Original Message-----
> From: owner-xsl-list@xxxxxxxxxxxxxxxx
> [mailto:owner-xsl-list@xxxxxxxxxxxxxxxx]On Behalf Of McKisson, Shawn
> Sent: Tuesday, August 03, 1999 8:38 AM
> To: 'xsl-list@xxxxxxxxxxxxxxxx'
> Subject: Multiple pages of well formed HTML ---> XML
>
>
> Thanks to all those that helped me with the linear to deep xsl
> transformation - the information you gave was priceless to a beginner like
> myself. (see post XSL problem 8/2/1999)
> Special thanks to David Carlisle and Dave Pawson who went out of their way
> to help.
>
> Related to this, I now have the need to gather well formed HTML from
> multiple web pages and form it into a single XML document. It
> seems like to
> only trick here is to get each of the HTML trees
> to hang off of the root node of the DOM tree that XSL is going to
> manipulate.
> ie.
>
> (wp = webpage)
>
>             DOM
>             root
>            / |  \
>           /  |   \
>          /   |    \
>         wp1 wp2..wpn
>
> With that accomplished, it seems that I could use XSL in standard way to
> generate the XML.
> Does this sound like a reasonable solution to the problem? Any other
> suggestions? (I haven't looked into XLink, so I'm not sure exactly what it
> is or if it is relevant here)
>
> --shawn
>
>
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread