RE: [xsl] How To Calculate Set of Unique Values Across a Tree of Input Documents

Subject: RE: [xsl] How To Calculate Set of Unique Values Across a Tree of Input Documents
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Fri, 21 Mar 2008 19:11:30 -0000
There was a recent thread on processing graphs in XSLT 2.0, see

http://markmail.org/message/tlletsiznepd5no6

I provided a (sketch of a) solution that involved listing all the paths
starting at a given node (while avoiding looping in the event of a cycle); a
simple adaptation of that will give you all the nodes reachable from a given
node. In your case the node identifiers can be obtained using
document-uri(); you then simply need to apply distinct-values() to the
returned set of URIs.

Michael Kay
http://www.saxonica.com/ 

> -----Original Message-----
> From: Eliot Kimber [mailto:ekimber@xxxxxxxxxxxx] 
> Sent: 21 March 2008 18:52
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] How To Calculate Set of Unique Values Across a 
> Tree of Input Documents
> 
> I have a tree of DITA map documents where each map references 
> zero or more other map or topic documents. The same map or 
> topic could be referenced multiple times.
> 
> I need to calculate the "bounded object set" of unique 
> documents referenced from within the compound map so that I 
> can then use an XSLT process to create new copies of each 
> document. Since I can't write to a given result more than 
> once I have to first remove any duplicates.
> 
> Each target document is referenced by a relative URI that can 
> be different for different references to the same file (and 
> in fact will almost always be different in my particular data set).
> 
> I am using XSLT 2.
> 
> Because key() tables are bound to input documents I don't 
> think I can build a table of references indexed by target 
> document URI (that is, the absolute URI of the target of the 
> reference). If I could I would simply build that table and 
> then just process the first member of each entry.
> 
> I can't think of any other efficient way to approach this. 
> The best idea I can come up with is to build an intermediate 
> document that reflects each document reference and then use 
> something like for-each-group on that to treat it as a set 
> for the purpose of processing each referenced file exactly 
> once. If I build a flat list of elements containing the 
> document URI of each reference I can easily sort the values 
> and then remove duplicates. So maybe that's as efficient as 
> anything else would be.
> 
> My other challenge is that my input data set is very large so 
> I have the potential to run into memory issues, so it may be 
> that writing out an intermediate file as part of a 
> multi-stage, multi-transform pipeline is
>   the best process, but my current processor will handle the 
> entire data set in one process for the purpose of applying 
> the (mostly) identity transform to the map set.
> 
> Can anyone suggest other solution approaches to this problem?
> 
> Once again I feel like I might be missing a clever solution 
> hidden in the haze of my XSLT 1 brain damage.
> 
> Thanks,
> 
> Eliot
> 
> --
> Eliot Kimber
> Senior Solutions Architect
> "Bringing Strategy, Content, and Technology Together"
> Main: 610.631.6770
> www.reallysi.com
> www.rsuitecms.com

Current Thread