Re: [xsl] Calculating groups of repeating elements

Subject: Re: [xsl] Calculating groups of repeating elements
From: Michael Ludwig <mlu@xxxxxxxxxxxxx>
Date: Thu, 11 Dec 2008 18:58:30 +0100
Wendell Piez schrieb:

It seems to me that if you are wanting to collect groups of 2+ words that appear in 2+ places, a useful first step would be to collect the set of intersections of words occuring in every pairing of places. This would be a large number, n(n-1)/2 for n places, but not the huge exponent of 2 cited by Michael, and hence possibly a more direct route to your goal.

Great! This looks like a much more useful approach to the problem!


yields this result:

<?xml version="1.0" encoding="UTF-8"?>
<collection>
   <common_words>
      <place_number>2</place_number>
      <place_number>1</place_number>
      <words>
         <word>Aa</word>
         <word>C</word>
      </words>
   </common_words>
   <common_words>
      <place_number>3</place_number>
      <place_number>1</place_number>
      <words>
         <word>Aa</word>
         <word>C</word>
         <word>Qqq</word>
      </words>
   </common_words>
   [...]

Now, generating all interesting subsets of "words/word/string()" can be done far more efficiently, as the input sets are probably *much* smaller on average.

While this isn't quite what you want, the results you want could be
derived by grouping these lists further, skipping pairings that
contain less than two 'word' elements, and collecting together those
have have the same sets (and thus represent sets of words that occur
in more than two places).

Yes. But I think you must still generate the subsets, because if you have, say, three occurrences of (a,b,c) and two of (a,b,d), you have five occurrences of (a,b), which is interesting, if my understanding of the requirement is correct.

This continues to be interesting.

Michael Ludwig

Current Thread