[xsl] performance advice sought

Subject: [xsl] performance advice sought
From: Mike Castle <dalgoda@xxxxxxxxxxxxx>
Date: Thu, 11 Dec 2003 15:36:48 -0800
XSL newbie.

It's quite possible that this particular task is better suited to something
besides XSL, but since I'm trying to plug this into ant as part of our
build & test system, and ant comes with a handy dandy <xslt/> task, I
figured it was a good place to start.

The solution I have come up with works.  It's logically correct.  However,
it'd damned slow.

Anyway, the task:

Our test system generates four files for each test run: test log
proper, test engine log, server log, and the server wrapper log (see
http://wrapper.sf.net).  Now, the first 3 files all use the same route
to write out infomation using an XML like <LogRecord/>, while the latter
is a plain text file.

What I need to do is analyze an aggregation of those files for each run.

* If a test engine log or server log has a <LogRecord/> with a severity
level of FATAL or MFATAL, or if the string "Exception" shows up in the
text of any <LogRecord/> or the server wrapper file, the whole run is
a bust, and we count all of the associated tests as failures.

*  For all of the other runs, we look at the the <LogRecord/>s in the test
log proper, and if any of those lines has a severity level of FATAL or
MFATAL or the string "Exception" in the text, then a particular test is
determined to have failed. (There may be multiple FATAL|MFATAL|Exceptions
for a particular test.)

* Sum up all of the tests ran, tests failed due to system errors, and
individual test failures.  For extra credit, I've been adding number
of known tests to pass just to make sure all of the numbers add up.
I'm paranoid like that.

* For each test failing with a system error, print out what test suite
failed.

* For each individual test that failed, print out what specific test
failed.

So really it's less of a transformation and more of a summary report.
Which is why I'm not certain that XSLT is the right tool for this.

What I do is creat a big XML wrapper using lots of entities to wrap all
of the pseudo XML fragments (and the plain text file comes in as a big
CDATA section as a <LogRecord/> element).

When it's all said and done, the XML looks like this:

<?xml version="1.0"?>
<Log>
 <Test name="testsuite1">
  <TestLog>
   <LogRecord severity="STATUS">Beginning test1 pacakge Test Suite 1 script....</LogRecord>
   ...
   <LogRecord severity="STATUS">good stuff is happening here</LogRecord>
  </TestLog>
  <TEngine>
   <LogRecord severity="STATUS">Test Engine started...</LogRecord>
   ...
   <LogRecord severity="STATUS">Test Engine stopped.</LogRecord>
  </TEngine>
  <Server>
   <LogRecord severity="STATUS">Server started...</LogRecord>
   ...
   <LogRecord severity="STATUS">Server shut down.</LogRecord>
  </Server>
  <Wrapper>
   <LogRecord><![CDATA[several lines of text here]]></LogRecord>
 </Test>
 ...
</Log>

Now, since each of these logs is really essentially a flat file, I have
to mechanically determine logical break points in the logs.  The break
points are determined by a <LogRecord/> where the text starts with the
string "Beginning" and has the string "scripts..." in it (Yeah, I know,
it's ugly).

So the style sheet I'm currently using is:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [
<!ENTITY recordpredicate "@severity='MFATAL' or @severity='FATAL' or contains(text(),'Exception')">
<!ENTITY recordfailure "LogRecord[&recordpredicate;]">
<!ENTITY systemcheck "child::*[self::TEngine or self::Server or self::Wrapper]">
<!ENTITY systemfailure "&systemcheck;/&recordfailure;">
<!ENTITY idtestrecord "[contains(substring-after(text(),'Beginning'),'scripts...')]">
<!ENTITY testscriptsid "TestLog/LogRecord&idtestrecord;">
<!ENTITY testname "normalize-space(substring-before(substring-after(.,'Beginning'),'package'))">
]>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
  <xsl:output method="html" encoding="UTF-8" indent="yes"/>
  <xsl:template match="/">
    <xsl:text>Test results:</xsl:text>
        <xsl:variable name="all-tests" select="/Log/Test/&testscriptsid;"/>
        <xsl:variable name="system-failed-tests" select="/Log/Test/&systemfailure;/../../&testscriptsid;"/>
        <xsl:variable name="not-system-failed-tests" select="$all-tests[count(.|$system-failed-tests)!=count($system-failed-tests)]"/>
        <xsl:variable name="individual-failed-tests" select="$not-system-failed-tests/../&recordfailure;/preceding-sibling::*&idtestrecord;[position()=1]"/>
        <xsl:variable name="not-individual-failed-tests" select="$not-system-failed-tests[count(.|$individual-failed-tests)!=count($individual-failed-tests)]"/>

        There were <xsl:number value="count($all-tests)"/> tests ran.
        There were <xsl:number value="count($system-failed-tests)"/> tests that failed at a system level failure.
        There were <xsl:number value="count($individual-failed-tests)"/> individual tests that failed.
        There were <xsl:number value="count($not-individual-failed-tests)"/> tests known to pass.
        <xsl:text>
System level failures:
</xsl:text>
        <xsl:for-each select="$system-failed-tests/../../@name">
          <xsl:value-of select="."/>
          <xsl:text>
</xsl:text>
        </xsl:for-each>
        <xsl:text>
 Which caused the following tests to fail:
</xsl:text>
        <xsl:for-each select="$system-failed-tests">
          <xsl:value-of select="../../@name"/> : <xsl:value-of select="&testname;"/>
          <xsl:text>
</xsl:text>
        </xsl:for-each>
        <xsl:text>
Individual failures:
</xsl:text>
        <xsl:for-each select="$individual-failed-tests">
          <xsl:value-of select="../../@name"/> : <xsl:value-of select="&testname;"/>
          <xsl:text>
</xsl:text>
        </xsl:for-each>
        <xsl:text>
Passed tests:
</xsl:text>
        <xsl:for-each select="$not-individual-failed-tests">
          <xsl:value-of select="../../@name"/> : <xsl:value-of select="&testname;"/>
          <xsl:text>
</xsl:text>
        </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

Ok, actually for each test with a system failure, I'm listing the
individual tests as well.  The whole "Beginning ... package ... scripts..."
thing is quite fragile, I know, and we wanted to make sure we had
everything typed in correctly in the test scripts.

On the smaller test runs, this style sheet performs quite well.  Takes
about 30 seconds to generate the report.  Everyone is happy.

However, on a larger set of runs, it takes over 1 hour to generate the
report.  Everyone is unhappy.

Well, tracking down the culprit seems to be this particular query:

<xsl:variable name="individual-failed-tests"
  select="$not-system-failed-tests/../&recordfailure;/preceding-sibling::*&idtestrecord;[position()=1]"/>

Which, in after thought, makes sense.  It's pretty much an O(MxN), as for
every FATAL|MFATAL|Exception it finds, it then scans backwards looking for
the magic strings.  And in the one particular file that's giving me issues,
there are 251789 LogRecords, 83 failing records and 102 tests.  So out of
250k of records, I'm really only interested in 200!

So, what can I do to speed up that process?

One process I considered was doing a two step process:  First would pull
out the records I'm interested in (essentially acting like a structured
grep) and then generating the summary against that.

I wish I knew how to do a previous-sibling to work against a smaller subset
rather than the whole tree.

Any advice on how to make this a bit more effecient?

Thanks!
mrc
-- 
     Mike Castle      dalgoda@xxxxxxxxxxxxx      www.netcom.com/~dalgoda/
    We are all of us living in the shadow of Manhattan.  -- Watchmen
fatal ("You are in a maze of twisty compiler features, all different"); -- gcc

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread