Re: [xsl] Calculating groups of repeating elements

Subject: Re: [xsl] Calculating groups of repeating elements
From: Michael Ludwig <mlu@xxxxxxxxxxxxx>
Date: Thu, 11 Dec 2008 16:36:11 +0100
Quinn Dombrowski schrieb:

This is great, Michael! If I can sort the <word-group>s in some standard way (alphabetically?) across all the places, I should be able then to group the data processed from your xsl based on how many places have a given <word-group>.

I'm glad you find this helpful, Quinn.

I made the assumption that within a given <place>, each word is unique,
so that for a given <place>, the expression "words/word/string()" yields
no duplicates.

So you (1) identify all <word>s occurring more than once.

You then (2) identify the intersection of (1) and "words/word/string()"
for each <place>. (Note I didn't write "words/word", because it's not
the element nodes you're interested in, but their values. See Ken's
remark on the intersect operator.)

You then (3) generate all subsets of (2) for each <place>, excluding
those you're not interested in, because they have less than two members,
and taking care to normalize them (sort them), which allows for easy
string comparisons later.

Note that the function I supplied is insufficient in that it doesn't
eliminate duplicates. Maybe take a look at the subset generation
algorithms you find on the web.

You now (4) eliminate all singleton subsets, because they're not
interesting to you.

You then (5) sort the remaining subsets by (a) length and (b) number of
occurrences, probably keeping track of which <place>s these occur in,
and how often, because that's interesting.

This is an interesting problem. Maybe there is a better algorithm
than the one I'm suggesting here. Or at least a less imperative, more
declarative formulation.

Michael Ludwig

Current Thread