Subject: Re: [xsl] Performance Question: Expensive Functions in Predicates From: "M. David Peterson" <m.david@xxxxxxxxxx> Date: Wed, 2 Jun 2004 18:26:22 -0600 |
Hi Eliot, Im interested to know more of how you came to these conclusions while also ignoring the use of xpath within apply-templates/@select to do the bulk of your node selection? I definitely have my own opinions on this but to make sure I wasn't completely off base I did a quick test. To me there are two priorities that need to be accounted for in every program (no matter what language it is written in). These are 1)total cycles to completion and 2)peak memory consumption during the process. There's obviously more than this but to me these are the most important numbers to help predict the overall performance of any application/software solution. If we first look at the total cycles (and define cycles as logic steps within the Trace output) and we take the following three scenarios we get some interesting results. For test XML I used an XML data set with a depth of three(3) child nodes and 96 total elements. A simple boolean test of the xpath revealed that 20 of these nodes matched the statement "foo[@bar = '1']". Scenario One: Xalan 2.5.1: 126 cycles <xsl:apply-templates select="foo[@bar = '1']"/> <xsl:template match="foo"> ... </xsl:template> Scenario Two: Xalan 2.5.1: 199 cycles <xsl:apply-templates select="foo"/> <xsl:template match="foo[@bar = '1']"> ... </xsl:template> <xsl:template match="foo"/> Scenario Three: Xalan 2.5.1: 329 cycles <xsl:apply-templates select="foo"/> <xsl:template match="foo"> <xsl:if test="@bar = '1'"> ... </xsl:if> </xsl:template> All three scenarios output the exact same data. It seems fairly obvious to me where the greatest performance is as far as minimum cycles to complete the transformation is concerened. Actually, scenario one and two use the exact same boolean test that results in the exact same subset of data. The difference in cycles of course comes when you add the fallthrough template that you are refering to which has to be put in place to catch all the elements that don't match the criteria in the match attribute of the template. XSL processors don't like anything that doesn't contain markup so if no match is found the value of the element or attribute gets dumped to the output and the processing continues. While in both cases the select attribute of apply-templates will only pass those elements that pass the boolean test, the first scenarios criteria is refined further than the second and as such the subset of scenario one(which is then matched to its corresponding template) is much smaller than scenario two resulting in fewer nodes to process with the statement contained in the match attribute. In the first two scenarios cycles are saved by reducing the number of elements that are processed as far as possible by using an XPath statement that matches an attribute/value pair in either the @select or the @match attributes of xsl:apply-templates and xsl:template respectively. Scenario three adds one more process to the mix before it begins to break things down (there is an interesting near 1 > 2 > 3 relationship in the total cycles for each scenario) to the attribute/value pair which means theres going to be one more step in every matching element from @select and @match attributes. While I am not suggesting there is no place for the conditional logic elements of xsl I am suggesting that their use should be reserved for fine tuning the results of your transformation and not for processing the bulk of your XPath statements. In fact, IMHO ;) there are very few cases that conditional logic elements should be used to process raw elements and attributes. There best use is when taking the string value of either an element or attribute and processing it further using a combination of conditional logic and string functions. In fact, as a general rule I believe the statement "templates are for processing elements and attributes by matching their values or combinations of values of these elements and attributes while the conditional elements xsl:if and xsl:choose[when][otherwise] should be used to further process the non-XML values of the resulting nodes that have been passed into the template using the above mentioned template match processing." With all of this in mind I see both scenario one and two as necessary methods for bulk transformation of XML data depending one one simple factor: Will the resulting nodeset from the XPath contained in the select attribute of xsl:apply-templates result in elements that will need to be further matched to more than one template? If "yes", use scenario two, and if "no", use scenario one. Actually, this question should be further qualified by asking if the result of the XPath statement can match multiple scenarios but only one of those scenarios needs to be transformed or all of the scenarios need to be transformed exactly the same. If this statement is true then every effort should be made to qualify the elements in the xpath contained in the select attribute and, if necessary, use unions in your match attribute to match multiple elements, element[values], element[@attribute values], or just simply [@attribute values] to the same template to be further processed. Taking peak memory into consideration (I have no data at this point to evaluate so im speculating) it is my speculation that scenario one will cause the smallest peak and scenario 3 the largest. This is based on the simple fact that the further you step into a logic tree the more memory required to store the data that got you in and the necessary data to get you back out. I realize this takes nothing else into account (And there are many more things to consider.) But without data to back me up I don't want to get to deep into speculating anything. I hope that I have in no way caused you to take offense to any of my comments. My intention wasn't to drive down your comments but instead to attempt to qualify with data what is actually taking place inside our transformations. Actual data is the absolute most important thing we can have at our disposal when evaluating performance and I believe that the above numbers, from a general perspective, showcase quite well which solution is best for each particular development scenario. If you have data that suggests anything contrary to what I am saying please let me/all of us know as, like you and everybody else in here, my ultimate goal is to write the best possible code for any given situation. And the more data there is that helps lean our code one way or another the better off we are all going to be in our development efforts. Best regards, <M:D/> ----- Original Message ----- From: "Eliot Kimber" <ekimber@xxxxxxxxxxxxxxxxxxx> To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx> Sent: Wednesday, June 02, 2004 2:10 PM Subject: Re: [xsl] Performance Question: Expensive Functions in Predicates > >>My question is where, in general, is the best place to use these > >>functions: > >> > >>- In apply-templates specifications? > >> > >>- In match specifications? > >> > >>- As IF blocks within templates? > > I just stumbled onto a subtle (at least to me) difference between these > two nominally equivalent forms: > > <xsl:template match="foo[util:is_applicable()]"> > > and > > <xsl:template match="foo"> > <xsl:if test="util:is_applicable()"> > > > Which is that in the first case all *inapplicable* foo elements fall > through to the default template, which if there's no explicit template > for "foo", means that the content of foo will likely flow to the output, > therefore failing to suppress inapplicable foo elements. Doh! > > Given that, it suggests that putting the check in the match= value is > the least attractive as it requires at least a single separate template > with a lower priority to catch all elements that fail the applicability > check, while doing the check at select time ensures that only applicable > elements will be processed at all. > > Cheers, > > Eliot > -- > W. Eliot Kimber > Professional Services > Innodata Isogen > 9030 Research Blvd, #410 > Austin, TX 78758 > (512) 372-8122 > > eliot@xxxxxxxxxxxxxxxxxxx > www.innodata-isogen.com > > > --+------------------------------------------------------------------ > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list > To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/ > or e-mail: <mailto:xsl-list-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx> > --+-- >
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Performance Question: Exp, Eliot Kimber | Thread | Re: [xsl] Performance Question: Exp, Eliot Kimber |
Re: [xsl] GByte Transforms, Kevin Jones | Date | setting params, Thomas Richter |
Month |