RE: [xsl] document() as extension mechanism

Subject: RE: [xsl] document() as extension mechanism
From: Joerg Pietschmann <joerg.pietschmann@xxxxxx>
Date: Mon, 23 Jul 2001 11:41:04 +0200
> "Michael Kay" <mhkay@xxxxxxxxxxxx> wrote:
> One limitation of the approach is that the URIResolver is given very little
> context information.
You'll have to encode the necessary context into the URI. In some cases
like reading the parameters of a HTTP requests in a servlet, it has to
be handed to the resolver at creation time:
  public MyServlet extends ...
    public void doGet(HttpServletRequest req,...
      transformer.setURIResolver(new MyURIResolver(req));

Also, i didn't think this would be an universal extension mechanism. While
you can use this mechanism to implement for example xx:string-tokenize() 
(document(concat('tokenize:',$theString))) or even xx:node-set() it
would probably suffer performance problems so drastic (compared to the
existing extension functions) that it would be unwise to try.

I rather thougt it as a convenient and hopefully efficient mechanism to
access various ressources external to the processor, for example
- LDAP (using LDAP URIs)
- inquire list of processes runing on a host
- inquire extended information of a file (owner, access rights...)
- get extended info of a calendar date (day number, ISO week number,
  conversion into other calendars, whether it's a holyday in a certain
  region or culture...)
This may remind some of the UNIX-strategy to have nearly every data stream
to have a path in the file system (remember /dev/zero or /proc/NNNNN).

Note that everything listed above or in the original post could also be
achived with http-URLS and a webserver and some extension mechanism
build into the webserver (http://webserver/query-ldap?ldap-uri
or http://webserver/cgi-bin/ etc.) This may, of
course, lead to a chicken-and-egg problem if your intention is to
use XSL to serve this URLs...

> Another is that the XSLT processor is obliged to ensure
> that two calls on document() with the same URL return the same result each
> time.
This is an issue even with standard URLs, as files may be overwritten
at any time or URLs may be served a dynamic content (as hopefully demonstrated
in the above paragraph). Of course, some people may be surprised that
their code like
  <xsl:template match="/">
    <xsl:variable name="starttime"
    <xsl:message>Time used: <xsl:value-of
       select="document('time:')//time-in-ms - $starttime"/>
wont work. In the example above this could be fixed by using different
URIs like time:start and time:end, if the machinery ignores the extra
stuff. Note that there are the same problems if you use a webserver
and an URL like 'http:///get-current-time' and the same fix (add some
dummy query string like 'http:///get-current-time?start'). You may
argument that the processor is not obliged to read the second document
after having processed the apply-templates, in fact it may read both
documents, store the results and do the apply-templates afterwards,
therefore the processing time will be wrong. But then you are likely
to have exactly the same problem if you use processor-specific extension
functions to read the current time.
There is of course no way to make an example like
  <xsl:variable name="before" select="document('read:file')"/>
  <xsl:variable name="dummy" select="document('write:file?stuff')"/>
  <xsl:variable name="after" select="document('read:file')"/>
guaranteed to work like the author probably has intended.


PS: Mike, are you interested to include some resolvers/XML-readers
into the "examples" section of your processor distribution?

 XSL-List info and archive:

Current Thread