Re: document() for non-XML documents

Subject: Re: document() for non-XML documents
From: "Ingo Macherius" <macherius@xxxxxxxxxxxxxxxx>
Date: Mon, 27 Sep 1999 19:05:44 +0200
Elliotte Rusty Harold <xsl-list@xxxxxxxxxxxxxxxx> wrote at 24 Sep 99, 18:50:

We have implemented the document() function for XQL. The approach is 
rather pragmatic, but working fine. This is the strategy:

1. If the document paramter is using HTTP, get the MIME type from the 
2. If another protocol is used (e.g. FTP), set the MIME type to 
3. Using predefined, data-type specific wrappers, map the included 
document to an XML-DOM. We currently support XML, HTTP and RTF using 
the drivers included in Sun's Swing library. 
4. Pass the DOM nodes to the XQL (or, in your case, XPath) processor. 
If the included document can not be translated to an XML-DOM, fail 
silently by passing an empty document to the XQL processor.

Thus the main idea is: everything can be included by document() that 
can be mapped to a DOM.

A very primitive wrapper for pure text could be to produce a generic 
container (say, <div>) including all of the text in a single Text-
node. More sophisticated mappers, e.g. for database content or texts 
of know structure, can easily be defined. To recognize processeable 
includes two things are needed: (1) a mapping function myFormat->DOM 
and (2) a MIME-Type (e.g. text/X-myTextFormat).

This approach may not scale web-wide, but for a controlled 
environment it is doing fine.


> The document() function allows me to merge multiple XML input documents.
> However, what if I need to merge text and HTML documents (and possibly
> other formats) into my output documents? Is there some way to do this?

> However, suppose I want to insert the contents of a simple text file:
>   <include href="compositions.txt"/>
> A slightly more complicated case: suppose I want to insert the contents of
> a non-well-formed HTML file:
>   <include href="compositions.html"/>

Ingo Macherius//Dolivostrasse 15//D-64293 Darmstadt//+49-6151-869-882
GMD-IPSI German National Research Center for Information Technology
Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Zappa)

 XSL-List info and archive:

Current Thread