[xsl] Performance Question: Expensive Functions in Predicates

Subject: [xsl] Performance Question: Expensive Functions in Predicates
From: Eliot Kimber <ekimber@xxxxxxxxxxxxxxxxxxx>
Date: Thu, 27 May 2004 10:35:59 -0500
I have a general question about predicting performance in the general case. I know that the best answer is "try it and see" but I'm wondering if there's a general principle that can guide design in this particular case.

In most of the work we do, which is processing technical documents to generate various outputs, we have to do applicability checks on pretty much every element to see if a particular element is applicable to the current processing conditions (target output, national language, computer platform, etc.). These applicability checks are fairly expensive computationally because they may need to investigate any number of properties of an element or its ancestry, neighbors, etc. It may also require the use of external extension functions and so on.

My question is where, in general, is the best place to use these functions:

- In apply-templates specifications?

- In match specifications?

- As IF blocks within templates?

For example, I could do this:

<xsl:apply-templates select="*[util:is_applicable()]"/>

Or

<xsl:template match="foo[util:is_applicable()]">

or

<xsl:template match="foo">
  <xsl:if test="util:is_applicable()">
  </xsl:if>
</xsl:template>

I think that the IF approach ensures the fewest calls but also makes the code more cluttered.

So I guess my question is: if not using the IF approach, would it be better to put the check in the apply-templates select or the match or does it matter or is it entirely a function of how a given XSLT implementation does its optimization?

Another option of course is to do the applicability processing as a separate step so that the base processing templates don't have to care about applicability. That would ensure that each element is only processed once for applicability but might introduce other performance or scalability issues since one would have to generate either a new serialized instance or a new result tree reflecting the input document(s). It would be a cleaner engineering solution as it would mean base template writers wouldn't have to know about the need to do applicability checks.

Thanks,

Eliot
--
W. Eliot Kimber
Professional Services
Innodata Isogen
9030 Research Blvd, #410
Austin, TX 78758
(512) 372-8122

eliot@xxxxxxxxxxxxxxxxxxx
www.innodata-isogen.com

Current Thread