Subject: Re: [xsl] [Summary] Three ways to express in XPath that there are no duplicates in a list of items From: Dimitre Novatchev <dnovatchev@xxxxxxxxx> Date: Fri, 2 Nov 2012 06:01:06 -0700 |
> count(Websites/*) = count(distinct-values(Websites/*)) A more efficient version (for an XPath processor with weak optimizer) of this is: not(Websites/*[count(distinct-values(Websites/*))+1]) So, we only count up to: count(distinct-values(Websites/*))+1 and don't need to count all children of Websites and then compare them to the count of distinct values. This is the same as Dirichlet's principle (aka "pigeonhole principle"): If n items are put into m pigeonholes with n > m, then at least one pigeonhole must contain more than one item. Cheers, Dimitre On Fri, Nov 2, 2012 at 2:33 AM, Liam R E Quin <liam@xxxxxx> wrote: > On Fri, 2012-11-02 at 09:15 +0000, Costello, Roger L. wrote: > [...] >> Here's how to implement it in XPath 1.0 and in XPath 2.0. >> >> XPath 1.0: >> >> not(Websites/*[. = preceding-sibling::*]) >> >> XPath 2.0: >> >> empty(Websites/*[index-of(../*,.)[2]]) >> count(Websites/*) = count(distinct-values(Websites/*)) >> >> The preferred XPath is the last one because it has the best >> performance. The first two take on the order of n-squared time (where >> n is the number of websites in the list) whereas the last XPath >> expression takes on the order of n log n time. > > Some brief comments on this... > (1) the XPath 1 solution also works in later versions of XPath; > (2) this only works to test simple text content > (3) an XPath 2 (or later) processor can rewrite the first or second > expression if it likes (and is smart enough) into the third > (4) the last expression can be implemented in linear time, not n log n, > because it's not not necessary to sort values to see if they are > distinct (e.g. the implementation can use a hash table), although this > does use more memory. But for simple content the hash table approach > doesn't even need to use more memory, can just point into the document > tree. > > So statements about performance need to be with respect to a particular > version of a specific product, in a specific environment. > > And, > (5) I'm sure these aren't the only ways to see if all items are > distinct :-) > > You can start getting pretty fancy, > some $w in Websites/Website satisfies count(Websites/Website[. eq $w]) > gt 1, > or flwor expressions in XQuery, > or use a key or (XPath in XSLT 3) a map, or.... :) > > Liam > > > -- > Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/ > Pictures from old books: http://fromoldbooks.org/ > Ankh: irc.sorcery.net irc.gnome.org freenode/#xml > -- Cheers, Dimitre Novatchev --------------------------------------- Truly great madness cannot be achieved without significant intelligence. --------------------------------------- To invent, you need a good imagination and a pile of junk ------------------------------------- Never fight an inanimate object ------------------------------------- To avoid situations in which you might make mistakes may be the biggest mistake of all ------------------------------------ Quality means doing it right when no one is looking. ------------------------------------- You've achieved success in your field when you don't know whether what you're doing is work or play ------------------------------------- Facts do not cease to exist because they are ignored. ------------------------------------- Typing monkeys will write all Shakespeare's works in 200yrs.Will they write all patents, too? :) ------------------------------------- I finally figured out the only reason to be alive is to enjoy it.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] [Summary] Three ways to e, Liam R E Quin | Thread | Re: [xsl] [Summary] Three ways to e, Michael Kay |
Re: [xsl] hierarchic counting in fl, G. Ken Holman | Date | Re: [xsl] hierarchic counting in fl, Michael Kay |
Month |