Subject: RE: [xsl] How To Calculate Set of Unique Values Across a Tree of Input Documents From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Fri, 21 Mar 2008 19:11:30 -0000 |
There was a recent thread on processing graphs in XSLT 2.0, see http://markmail.org/message/tlletsiznepd5no6 I provided a (sketch of a) solution that involved listing all the paths starting at a given node (while avoiding looping in the event of a cycle); a simple adaptation of that will give you all the nodes reachable from a given node. In your case the node identifiers can be obtained using document-uri(); you then simply need to apply distinct-values() to the returned set of URIs. Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: Eliot Kimber [mailto:ekimber@xxxxxxxxxxxx] > Sent: 21 March 2008 18:52 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: [xsl] How To Calculate Set of Unique Values Across a > Tree of Input Documents > > I have a tree of DITA map documents where each map references > zero or more other map or topic documents. The same map or > topic could be referenced multiple times. > > I need to calculate the "bounded object set" of unique > documents referenced from within the compound map so that I > can then use an XSLT process to create new copies of each > document. Since I can't write to a given result more than > once I have to first remove any duplicates. > > Each target document is referenced by a relative URI that can > be different for different references to the same file (and > in fact will almost always be different in my particular data set). > > I am using XSLT 2. > > Because key() tables are bound to input documents I don't > think I can build a table of references indexed by target > document URI (that is, the absolute URI of the target of the > reference). If I could I would simply build that table and > then just process the first member of each entry. > > I can't think of any other efficient way to approach this. > The best idea I can come up with is to build an intermediate > document that reflects each document reference and then use > something like for-each-group on that to treat it as a set > for the purpose of processing each referenced file exactly > once. If I build a flat list of elements containing the > document URI of each reference I can easily sort the values > and then remove duplicates. So maybe that's as efficient as > anything else would be. > > My other challenge is that my input data set is very large so > I have the potential to run into memory issues, so it may be > that writing out an intermediate file as part of a > multi-stage, multi-transform pipeline is > the best process, but my current processor will handle the > entire data set in one process for the purpose of applying > the (mostly) identity transform to the map set. > > Can anyone suggest other solution approaches to this problem? > > Once again I feel like I might be missing a clever solution > hidden in the haze of my XSLT 1 brain damage. > > Thanks, > > Eliot > > -- > Eliot Kimber > Senior Solutions Architect > "Bringing Strategy, Content, and Technology Together" > Main: 610.631.6770 > www.reallysi.com > www.rsuitecms.com
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] How To Calculate Set of Uniqu, Eliot Kimber | Thread | [xsl] Transforming augmented XHTML , Aaron Gray |
RE: [xsl] XPath 2.0 Best Practice: , Michael Kay | Date | [xsl] Transforming augmented XHTML , Aaron Gray |
Month |