Subject: Re: [xsl] Comparing documents: what of P is a subset of D? From: Michael Kay <mike@xxxxxxxxxxxx> Date: Thu, 27 Feb 2014 14:32:31 +0000 |
I'm not sure I've completely understood your "equality" relation that underpins the intersection. Perhaps it's based on equality of the function string-join(ancestor-or-self::*/@_ix, '|') let's call this function $f, and we can use this as a parameter to the rest of the solution. we then need to do doc('d.xml')//fc[some $e in doc('p.xml') satisfies $f($e) eq $f(.)] ! path(.) where path(.) is a function you can write to display the path to the selected fc element. The only remaining problem is that this is O(n*m) where n and m are the sizes of D and P. For a more efficient solution, define a key on P.XML that indexes each element on the value of the function $f, and replace the predicate by a call on key(). The above uses XPath 3.0, but it can probably be expressed in XPath 2.0 easily enough at the cost of hard-coding the equality function. Michael Kay Saxonica On 27 Feb 2014, at 10:25, Wolfgang Laun <wolfgang.laun@xxxxxxxxx> wrote: > <cca><!-- a D XML --> > <rela _ix='0' fa='0' fb='1'> > <fc _ix='1' fc_fa='X1' fc_fb='1'/> > <fc _ix='2' fc_fa='X2' fc_fb='2'/> > </rela> > <rela _ix='1' fa='10' fb='11'> > <fc _ix='1' fc_fa='Y1' fc_fb='11'/> > <fc _ix='2' fc_fa='Y2' fc_fb='12'/> > </rela> > <rela _ix='5' fa='50' fb='51'> > <fc _ix='1' fc_fa='A1' fc_fb='51'/> > <fc _ix='2' fc_fa='A2' fc_fb='52'/> > </rela> > <relb>...</relb> > <relc>...</relc> > </cca> > > <cca><!-- a P XML --> > <rela _ix='1' fa='10'> > <fc _ix='1' fc_fa='Y1' fc_fb='99'/> > </rela> > <rela _ix='5' fa='50' fb='51'> > <fc _ix='1' fc_fb='51' fc_fc='123'/> > <fc _ix='2' fc_fa='A2' fc_fb='52' fc_fc='456'/> > </rela> > </cca> > > Expected output: > > /cca/rela(1)/fa 10 > /cca/rela(1)/fc(1)/fc_fa Y1 > /cca/rela(5)/fa 50 > /cca/rela(5)/fa 51 > /cca/rela(5)/fc(1)/fc_fb 51 > /cca/rela(5)/fc(2)/fc_fa A2 > /cca/rela(5)/fc(2)/fc_fb 52 > > Note that parentheses enclose values of @_ix. > > -W > > On 27/02/2014, Michael Kay <mike@xxxxxxxxxxxx> wrote: >> It would be easier to understand the problem with some example data. >> >> Michael Kay >> Saxonica >> >> On 27 Feb 2014, at 08:05, Wolfgang Laun <wolfgang.laun@xxxxxxxxx> wrote: >> >>> The data model for a set of similarly (but not identically) built XML >>> documents is: a collection of arrays of records, which may contain >>> (recursively) arrays, records and scalars. (The terms "array" and >>> "record" are used in their "classic" meaning as, e.g., in Pascal.) >>> Document structures are fairly stable, but they do change over time. >>> Array elements are identified (indexed) by @_ix, not by position. >>> Record fields can be elements or attributes (when they are scalar). >>> Order is undefined, since XPaths plus @_Ix's pinpoint each node. >>> >>> One XML document D contains a full population for such a data set >>> (O(1MB)). A second XML document P contains "patches", i.e., each node >>> appearing in P is expected to be in D as well. >>> >>> If S(P) is the sequence of nodes (annotated with their XPaths) in P >>> and S(D) the one with nodes from D, how can I determine S(P) intersect >>> S(D) (except all @_ix, whose values are bound to be identical)? Of >>> course, I don't want the common set of *data items* - I want the XML >>> paths of those common data items. >>> >>> A solution (in XSLT 2.0) should not need individual adaption for each >>> kind of data set. >>> >>> I'm confident that I can create text files for D and P containing one >>> line <path> <value> for each node and run diff (after sort). >>> >>> Any better ideas? >>> >>> Cheers >>> Wolfgang
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Comparing documents: what, Wolfgang Laun | Thread | Re: [xsl] Comparing documents: what, Wolfgang Laun |
Re: [xsl] Escaped canonical XML?, Michael Kay | Date | Re: [xsl] toc with number with elem, Wendell Piez |
Month |