Re: [xsl] transform optimization for a schema-constrained domain

Subject: Re: [xsl] transform optimization for a schema-constrained domain
From: Joerg Pietschmann <joerg.pietschmann@xxxxxx>
Date: Thu, 26 Jul 2001 08:43:28 +0200
> From: "Huebel, David" <dhuebel@xxxxxxxxxxxxxx>
>
> Hello,
> 
> Are there any XSLT processors that can use a schema for the input domain to
> improve performance?
[...]
> Has this been implemented anywhere,

As far as i know, no processor uses knowledge of a DTD or Schema attached
to the XML source for anything other than validation (and its actually the
parser doing that).
In case of DTDs it has probably something to do with the lack of an
API for acessing the element definitions. With schemas being XML, this is
no longer an excuse but then schema support is not only horribly complex
but still somewhat in development.

> and does anyone have any comments on its
> usefulness?

It would provide for some optimizations. The most obvious example is
that the processor could use a table which tells whether elements
may have certain descendents to optimize tree scanning for expressions
involving // and perhaps for building lookup tables for key()s. Of
course this would imply that the source XML must be validated against the
DTD or Schema, and it is not clear whether the up-front costs pay off.

There are further useful optimizations if elements are defined as
sequence of child elements. For example if <!ELEMENT e (a,b,c,d,e)>,
child elements could be looked up by index instead of scanning the
subtree for the node. In the case  <!ELEMENT e (a,b?,c?,d?,e?)> you
could stop scanning the subtree for c-elements once a c or d is found.

The data type support of schemas may provide some more opportunities.
If an element or attribute is defined to be a number, the validation
step could as well store the value as an internal number representation
as it has to verify it anyway, and the expression evaluation machinery
could blindly access the numerical value instead of converting it from
the string representation every time. For values constrained by regexps,
string processing operations may also be optimizable.

Note that the optimizations above are all peanuts. They apply however
even to carefully crafted XSL code. There may be a huge potential for
optimization of lazily built, otherwise quite inefficient style sheets.
If there is a <xsl:for-each select="//stuff"> while the stuff-element
has already been removed from the DTD/Schema, getting rid of the for-each
of could be a substantial win.

This leads to my last point: A DTD/Schema-aware XSL processor could warn
me of misspellings and incompatibilities of the XML structure and XPath-
expression in the XSL. For example, if i have an <!ELEMENT e (name,stuff)>
and <xsl:value-of select="e/naem"/>, the processor could tell me that
i wrote something wrong. If i change the XML structure, for example
by removing the name-child a defining the name as an attribute, and
forget to change the XSL, the processor would also tell me. If such
a feature had benn available it would already have saved me a awful
huge amount of debugging time.
Waiting for implementation of this... :-)

Regards
J.Pietschmann
--

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread