Re: [xsl] Calculating groups of repeating elements

Subject: Re: [xsl] Calculating groups of repeating elements
From: Michael Ludwig <milu71@xxxxxx>
Date: Thu, 11 Dec 2008 04:11:22 +0100
Michael Ludwig schrieb am 11.12.2008 um 03:56:25 (+0100):
> 
> A set of 75 words has 2 ^ 75 (3.77789318629572e+22) possible subsets.
> The good news for you is that you can eliminate 76 out of these for
> having less than two members. :-)

Depending on how many of those you can exclude before-hand, it might
actually be feasable. Tackling this problem, my first thought was to
eliminate all singleton words. So if you have only 10 words in your
75-word-set occurring twice or more, this translates into only 1024
subsets to generate for the <place> in question.

I'm including what I've patched together before giving up on the subset
generation algorithm. Note this is either a dead-end or a work in
progress; it clumsily outputs intermediate result, and there are
probably a couple of things that could be done to take advantage of XSLT
2.0's facilities.

Michael Ludwig

C:\dev\XSLT2 :: more /t1 dombrowski.xsl
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
 xmlns:xs="http://www.w3.org/2001/XMLSchema";
 xmlns:quinn="Quinn:Dombrowski"
 exclude-result-prefixes="xs quinn">

 <xsl:output indent="yes"/>

 <!-- Mehrfach vorkommende <word/> ermitteln. F|r jeden <place/> alle
 Teilmengen mehrfach vorkommender <word/> bauen. Als sortierte Sequenzen
 von Zeichenfolgen bereithalten. Alle nur einfach vorkommenden Sequenzen
 eliminieren. Die verbleibenden erst der Ldnge und dann der Anzahl der
 Vorkommnisse nach sortieren. Fertig. Als Information in der Ausgabe
 (a) die Ldnge und (b) die Hdufigkeit der Gruppe. -->

 <xsl:variable name="word-frequency-map" as="element()*">
  <xsl:for-each-group select="//word" group-by=".">
   <word count="{ count( current-group()) }">
    <xsl:value-of select="current-grouping-key()"/>
   </word>
  </xsl:for-each-group>
 </xsl:variable>

 <xsl:variable name="seq-candidate-words" as="xs:string*"
  select="$word-frequency-map[ @count > 1 ]"/>

 <xsl:variable name="candidate-words" as="element()*">
  <xsl:for-each select="//place">
   <xsl:copy>
    <xsl:attribute name="number" select="place_number"/>
    <xsl:for-each select="words/word[ . = $seq-candidate-words ]">
     <xsl:sort/>
     <xsl:copy-of select="."/>
    </xsl:for-each>
   </xsl:copy>
  </xsl:for-each>
 </xsl:variable>

 <!-- Jetzt alle Permutationen der Ldnge 2 oder grv_er bauen. -->

 <xsl:variable name="permutations" as="element()*">
  <xsl:for-each select="$candidate-words">
   <xsl:copy>
    <xsl:copy-of select="@*"/>
    <xsl:copy-of select="quinn:permutations( word)"/>
   </xsl:copy>
  </xsl:for-each>
 </xsl:variable>

 <xsl:function name="quinn:permutations" as="element()*">
  <xsl:param name="words" as="element()*"/>
  <xsl:if test="count( $words) > 1">
   <word-group>
    <xsl:value-of select="$words"/>
   </word-group>
   <xsl:for-each select="$words">
    <xsl:copy-of select="quinn:permutations( $words except .)"/>
   </xsl:for-each>
   <xsl:copy-of select="quinn:permutations( $words[ position() > 1 ])"/>
  </xsl:if>
 </xsl:function>

 <xsl:template match="atlas">
  <xsl:copy-of select="$word-frequency-map"/>
  <candidate-words>
   <xsl:copy-of select="$seq-candidate-words"/>
  </candidate-words>
  <xsl:copy-of select="$candidate-words"/>
  <xsl:copy-of select="$permutations"/>
 </xsl:template>

</xsl:stylesheet>

Current Thread