Re: [xsl] Trying to Detect corrupt data

Subject: Re: [xsl] Trying to Detect corrupt data
From: "G. Ken Holman" <gkholman@xxxxxxxxxxxxxxxxxxxx>
Date: Thu, 25 Oct 2007 07:50:10 -0400
At 2007-10-25 07:03 +0100, Arthur Maloney wrote:
I'm trying to detect which row elements contain corrupt data.
Once I've detected the row is corrupt I'm OK printing it.

Depending on user choice. The Xml file contains 500-50,000 row elements

In the Xml file each row element contains between 2-15 agent elements
(there is always more than 1). The agent name should be the same in all agent elements.

XSLT allows for easy comparison of the members of node sets, so in this case, one can just check the set against itself. A node set comparison is initialized to false, and the processor checks all possible combinations of operands until either any one comparison is true or the combinations are exhausted (which can be slow for large data sets).


But there is no way to instruct the processor to do the comparisons one way or the other ... so when dealing with both operands as node sets, one cannot ensure that the minimum number of checks is done. For example, the following works:

<xsl:if test="agent != agent">corrupt</xsl:if>

... but there is a possibility the processor will happen to walk through a combination of operands to produce a result in a very long time. What if the processor first chose to compare each member of the first operand against the corresponding member of the second operand? The first loop through the entire set would be true for every comparison. While that may not be likely, there is no control from the stylesheet writer.

The equivalent result can be obtained with:

<xsl:if test="agent[1] != agent[ position()>1 ]">corrupt</xsl:if>

... where I am comparing one node against a set of nodes, and I think this second way has no chance of a combination of operands "taking a long time" to come up with a result.

So I used the second approach in the answer below, rather than the obvious first answer above.

Example of Xml file
1.  AppicantNumber is always unique for each row element
2.  row 1 is not corrupt. All agent names the same
3.  rows 2 and 3 are corrupt. Contain more than one agent name (row2
contains 4 names, row3 2 names).

I hope the explanation and answer below helps.


. . . . . . . . . Ken

t:\ftemp>type arthur.xml
<table>
...
<row>
     <applicantNumber>56789</applicantNumber>
     <agent>John1</agent>
     <agent>John1</agent>
     <agent>John1</agent>
     <agent>John1</agent>
</row>
...
<row>
     <applicantNumber>127789</applicantNumber>
     <agent>John27</agent>
     <agent>John1</agent>
     <agent>Fred13</agent>
     <agent>John27</agent>
     <agent>John27</agent>
     <agent>John27</agent>
     <agent>Paul8</agent>
     <agent>John27</agent>
</row>
...
<row>
     <applicantNumber>16789345</applicantNumber>
     <agent>Fred9</agent>
     <agent>Fred9</agent>
     <agent>Fred9</agent>
     <agent>John1</agent>
     <agent>Fred9</agent>
     <agent>Fred9</agent>
     <agent>Fred9</agent>
     <agent>Fred9</agent>
     <agent>Fred9</agent>
</row>
...
</table>
t:\ftemp>type arthur.xsl
<?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
                version="1.0">

<xsl:output method="text"/>

<xsl:template match="/">
  <xsl:for-each select="table/row">
    <xsl:value-of select="applicantNumber"/>: <xsl:text/>
    <xsl:if test="agent[1] != agent[ position()>1 ]">corrupt</xsl:if>
    <xsl:text>
</xsl:text>
  </xsl:for-each>
</xsl:template>

</xsl:stylesheet>
t:\ftemp>xslt arthur.xml arthur.xsl con
56789:
127789: corrupt
16789345: corrupt

t:\ftemp>


-- Comprehensive in-depth XSLT2/XSL-FO1.1 classes: Austin TX,Jan-2008 World-wide corporate, govt. & user group XML, XSL and UBL training RSS feeds: publicly-available developer resources and training G. Ken Holman mailto:gkholman@xxxxxxxxxxxxxxxxxxxx Crane Softwrights Ltd. http://www.CraneSoftwrights.com/s/ Box 266, Kars, Ontario CANADA K0A-2E0 +1(613)489-0999 (F:-0995) Male Cancer Awareness Jul'07 http://www.CraneSoftwrights.com/s/bc Legal business disclaimers: http://www.CraneSoftwrights.com/legal

Current Thread