Re: [xsl] To determine the distinct elements in a sequence of 46,656 elements takes 5 hours of XSLT processing

Subject: Re: [xsl] To determine the distinct elements in a sequence of 46,656 elements takes 5 hours of XSLT processing
From: "Imsieke, Gerrit, le-tex" <gerrit.imsieke@xxxxxxxxx>
Date: Mon, 20 Aug 2012 01:21:02 +0200
That's what I was about to post (using f:signature as what I called hash):

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
  xmlns:xs="http://www.w3.org/2001/XMLSchema";
  xmlns:f="f"
  version="2.0"
  exclude-result-prefixes="f xs"
  >

<xsl:output method="xml" indent="yes" />

<xsl:function name="f:signature">
<xsl:param name="map" as="element(map)" />
<xsl:variable name="sorted-map" as="element(singletonMap)*">
<xsl:perform-sort select="$map/*">
<xsl:sort select="@from"/>
<xsl:sort select="@to"/>
</xsl:perform-sort>
</xsl:variable>
<xsl:sequence select="string-join($sorted-map/concat(@from, '~', @to), '|')"/>
</xsl:function>


  <xsl:template match="maps">
    <xsl:for-each-group select="map" group-by="f:signature(.)">
      <xsl:sequence select="." />
    </xsl:for-each-group>
  </xsl:template>

</xsl:stylesheet>

Input (5 maps):

<?xml version="1.0" encoding="utf-8"?>
<maps>
  <map id="Planes-Enroute-to-Airports">
    <singletonMap from="F16" to="DFW"/>
    <singletonMap from="B707" to="ORD"/>
    <singletonMap from="F35" to="MIA"/>
    <singletonMap from="S340" to="LAX"/>
    <singletonMap from="A320" to="SFO"/>
    <singletonMap from="MD90" to="DEN"/>
  </map>
  <map id="Planes-Enroute-to-Airports">
    <singletonMap from="F16" to="DFW"/>
    <singletonMap from="B707" to="BER"/>
    <singletonMap from="F35" to="MIA"/>
    <singletonMap from="S340" to="LAX"/>
    <singletonMap from="A320" to="SFO"/>
    <singletonMap from="MD90" to="DEN"/>
  </map>
  <map id="Planes-Enroute-to-Airports">
    <singletonMap from="F16" to="DFW"/>
    <singletonMap from="B707" to="ORD"/>
    <singletonMap from="F35" to="MIA"/>
    <singletonMap from="S340" to="LAX"/>
    <singletonMap from="A320" to="SFO"/>
    <singletonMap from="MD90" to="DEN"/>
  </map>
  <map id="Planes-Enroute-to-Airports">
    <singletonMap from="F16" to="DFW"/>
    <singletonMap from="F35" to="MIA"/>
    <singletonMap from="S340" to="LAX"/>
    <singletonMap from="A320" to="SFO"/>
    <singletonMap from="MD90" to="DEN"/>
  </map>
  <map id="Planes-Enroute-to-Airports">
    <singletonMap from="F16" to="DFW"/>
    <singletonMap from="B767" to="ORD"/>
    <singletonMap from="F35" to="MIA"/>
    <singletonMap from="S340" to="LAX"/>
    <singletonMap from="A320" to="SFO"/>
    <singletonMap from="MD90" to="DEN"/>
  </map>
</maps>

Output (4 maps):

<?xml version="1.0" encoding="UTF-8"?>
<maps>
   <map id="Planes-Enroute-to-Airports">
      <singletonMap from="F16" to="DFW"/>
      <singletonMap from="B707" to="ORD"/>
      <singletonMap from="F35" to="MIA"/>
      <singletonMap from="S340" to="LAX"/>
      <singletonMap from="A320" to="SFO"/>
      <singletonMap from="MD90" to="DEN"/>
  </map>
   <map id="Planes-Enroute-to-Airports">
      <singletonMap from="F16" to="DFW"/>
      <singletonMap from="B707" to="BER"/>
      <singletonMap from="F35" to="MIA"/>
      <singletonMap from="S340" to="LAX"/>
      <singletonMap from="A320" to="SFO"/>
      <singletonMap from="MD90" to="DEN"/>
  </map>
   <map id="Planes-Enroute-to-Airports">
      <singletonMap from="F16" to="DFW"/>
      <singletonMap from="F35" to="MIA"/>
      <singletonMap from="S340" to="LAX"/>
      <singletonMap from="A320" to="SFO"/>
      <singletonMap from="MD90" to="DEN"/>
  </map>
   <map id="Planes-Enroute-to-Airports">
      <singletonMap from="F16" to="DFW"/>
      <singletonMap from="B767" to="ORD"/>
      <singletonMap from="F35" to="MIA"/>
      <singletonMap from="S340" to="LAX"/>
      <singletonMap from="A320" to="SFO"/>
      <singletonMap from="MD90" to="DEN"/>
  </map>
</maps>

On 2012-08-20 01:16, Michael Kay wrote:

On 20/08/2012 00:11, Michael Kay wrote:
Try the following:

count(distinct-values(map/f:signature(.))
Sorry, I misread the question as just wanting the number of distinct
maps. If you want the first map in each set of duplicates, use instead

<xsl:for-each-group select="map" group-by="f:signature(.)">
    <xsl:sequence select="current-group()[1]"/>
</xsl:for-each-group>

Michael Kay
Saxonica

where f:signature is


<xsl:function name="f:signature">
  <xsl:param name="map" as="element(map)">
  <xsl:variable name="sorted-map" as="element(singletonMap)*">
    <xsl:perform-sort select="$map/*">
        <xsl:sort select="@from"/>
        <xsl:sort select="@to"/>
    </xsl:perform-sort>
  </xsl:variable>
  <xsl:sequence select="string-join($sorted-map/concat(@from, '~',
@to), '|')"/>
</xsl:function>

Michael Kay
Saxonica


On 19/08/2012 23:43, Costello, Roger L. wrote:
Hi Folks,

I have a sequence of 46,656 elements that I call "maps."

Here is one map:

     <map id="Planes-Enroute-to-Airports">
         <singletonMap from="F16" to="DFW"/>
         <singletonMap from="B707" to="ORD"/>
         <singletonMap from="F35" to="MIA"/>
         <singletonMap from="S340" to="LAX"/>
         <singletonMap from="A320" to="SFO"/>
         <singletonMap from="MD90" to="DEN"/>
     </map>

I wrote a function to return all of the distinct maps.

Unfortunately it takes about 5 hours of XSLT processing.

Perhaps my XSLT program is inefficient.

I am hoping that you can show me a more efficient program or identify
where my program is inefficient.

I am using XSLT 2.0.

Here is my function to return all distinct maps:
          <xsl:function name="ct:distinct" as="element(map)*">
         <xsl:param name="maps" />
                  <xsl:variable name="new-maps">
             <maps>
                 <xsl:sequence select="$maps" />
             </maps>
         </xsl:variable>
         <xsl:for-each select="$new-maps/maps/map">
                 <xsl:if test="not(ct:contained-within(.,
./following-sibling::map))"><xsl:sequence select="." /></xsl:if>
         </xsl:for-each>
              </xsl:function>

The following function determines if a map is contained within a
sequence of maps; it uses a binary divide-and-conquer approach:
          <xsl:function name="ct:contained-within" as="xs:boolean">
         <xsl:param name="map" as="element(map)"/>
         <xsl:param name="maps" as="element(map)*"/>
                  <xsl:variable name="cnt" select="count($maps)" />
                  <xsl:choose>
             <xsl:when test="$cnt eq 0"><xsl:value-of
select="false()" /></xsl:when>
             <xsl:when test="ct:equal($map, $maps[1])"><xsl:value-of
select="true()" /></xsl:when>
             <xsl:when test="$cnt eq 1"><xsl:value-of
select="false()" /></xsl:when>
             <xsl:otherwise>
                 <xsl:variable name="half" select="$cnt idiv 2" />
                 <xsl:choose>
                     <xsl:when test="ct:contained-within($map,
$maps[position() = (2 to $half)])"><xsl:value-of select="true()"
/></xsl:when>
                     <xsl:otherwise><xsl:value-of
select="ct:contained-within($map, $maps[position() = (($half + 1) to
last())])" /></xsl:otherwise>
                 </xsl:choose>
                              </xsl:otherwise>
         </xsl:choose>
              </xsl:function>

Two maps are equal iff for each singletonMap in map1 there is a
singletonMap in map2 with the same value for @to and @from:
          <xsl:function name="ct:equal" as="xs:boolean">
         <xsl:param name="map1" as="element(map)"/>
         <xsl:param name="map2" as="element(map)"/>
                  <xsl:choose>
             <xsl:when test="count($map1/*) ne
count($map2/*)"><xsl:value-of select="false()" /></xsl:when>
             <xsl:otherwise>
                 <xsl:variable name="result">
                     <xsl:for-each select="$map1/singletonMap">
                         <xsl:variable name="here" select="." />
                         <xsl:if test="$map2/singletonMap[@from eq
$here/@from and @to ne $here/@to]">false</xsl:if>
                     </xsl:for-each>
                 </xsl:variable>
                                  <xsl:choose>
                     <xsl:when test="contains($result,
'false')"><xsl:value-of select="false()" /></xsl:when>
                     <xsl:otherwise><xsl:value-of select="true()"
/></xsl:otherwise>
                 </xsl:choose>
             </xsl:otherwise>
         </xsl:choose>
              </xsl:function>

/Roger


-- Gerrit Imsieke Geschdftsf|hrer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschdftsf|hrer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard Vvckler

Current Thread