Re: [xsl] Calculating groups of repeating elements

Subject: Re: [xsl] Calculating groups of repeating elements
From: Quinn Dombrowski <qdombrow@xxxxxxxx>
Date: Thu, 11 Dec 2008 15:06:57 -0600
Thanks a ton Wendell, Michael L and Michael K! You've given me quite a lot to chew on. I'm going to give it a shot on my real data set (a pile of Cyrillic with extra diacritics and linguistic symbols) and let you know how it goes.

Wendell Piez wrote:

At 12:58 PM 12/11/2008, Michael wrote:
It seems to me that if you are wanting to collect groups of 2+ words
that appear in 2+ places, a useful first step would be to collect the
set of intersections of words occuring in every pairing of places.
This would be a large number, n(n-1)/2 for n places, but not the huge
exponent of 2 cited by Michael, and hence possibly a more direct route
to your goal.

Great! This looks like a much more useful approach to the problem!

Thanks ... I hope so.

BTW, since writing that it has also occurred to me that by declaring a key that would return places based on descendant word elements, one could speed up the generation of this set and avoid empty intersections. So:

<xsl:key name="place-by-word" match="place" use=".//word"/>

<xsl:template match="atlas">
<xsl:for-each select="place">
<xsl:variable name="first" select="."/>
<xsl:for-each select="key('place-by-word',.//word)[. << $first]">
<xsl:variable name="second" select="."/>
<xsl:copy-of select="$first/place_number, $second/place_number"/>
<xsl:copy-of select="$first/words/word[.=$second/words/word]"/>

(This requires testing, of course.)

While this isn't quite what you want, the results you want could be
derived by grouping these lists further, skipping pairings that
contain less than two 'word' elements, and collecting together those
have have the same sets (and thus represent sets of words that occur
in more than two places).

Yes. But I think you must still generate the subsets, because if you have, say, three occurrences of (a,b,c) and two of (a,b,d), you have five occurrences of (a,b), which is interesting, if my understanding of the requirement is correct.

This is a good point; only the OP can say if it's in scope.

(Hm: could this be done by recursing to intersect among the intersections, dropping singleton cases along the way? The overload warning lamp in my brain is now starting to flash.)

This continues to be interesting.

Yes, it does.


Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.      
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
  Mulberry Technologies: A Consultancy Specializing in SGML and XML

Current Thread