Re: [xsl] Comparing documents: what of P is a subset of D?

Subject: Re: [xsl] Comparing documents: what of P is a subset of D?
From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx>
Date: Thu, 27 Feb 2014 11:25:53 +0100
<cca><!-- a D XML -->
  <rela _ix='0' fa='0' fb='1'>
     <fc _ix='1' fc_fa='X1' fc_fb='1'/>
     <fc _ix='2' fc_fa='X2' fc_fb='2'/>
  </rela>
  <rela _ix='1' fa='10' fb='11'>
     <fc _ix='1' fc_fa='Y1' fc_fb='11'/>
     <fc _ix='2' fc_fa='Y2' fc_fb='12'/>
  </rela>
  <rela _ix='5' fa='50' fb='51'>
     <fc _ix='1' fc_fa='A1' fc_fb='51'/>
     <fc _ix='2' fc_fa='A2' fc_fb='52'/>
  </rela>
  <relb>...</relb>
  <relc>...</relc>
</cca>

<cca><!-- a P XML -->
  <rela _ix='1' fa='10'>
     <fc _ix='1' fc_fa='Y1' fc_fb='99'/>
  </rela>
 <rela _ix='5' fa='50' fb='51'>
     <fc _ix='1'                 fc_fb='51' fc_fc='123'/>
     <fc _ix='2' fc_fa='A2' fc_fb='52' fc_fc='456'/>
  </rela>
</cca>

Expected output:

/cca/rela(1)/fa   10
/cca/rela(1)/fc(1)/fc_fa   Y1
/cca/rela(5)/fa   50
/cca/rela(5)/fa   51
/cca/rela(5)/fc(1)/fc_fb   51
/cca/rela(5)/fc(2)/fc_fa   A2
/cca/rela(5)/fc(2)/fc_fb   52

Note that parentheses enclose values of @_ix.

-W

On 27/02/2014, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> It would be easier to understand the problem with some example data.
>
> Michael Kay
> Saxonica
>
> On 27 Feb 2014, at 08:05, Wolfgang Laun <wolfgang.laun@xxxxxxxxx> wrote:
>
>> The data model for a set of similarly (but not identically) built XML
>> documents is: a collection of arrays of records, which may contain
>> (recursively) arrays, records and scalars. (The terms "array" and
>> "record" are used in their "classic" meaning as, e.g., in Pascal.)
>> Document structures are fairly stable, but they do change over time.
>> Array elements are identified (indexed) by @_ix, not by position.
>> Record fields can be elements or attributes (when they are scalar).
>> Order is undefined, since XPaths plus @_Ix's pinpoint each node.
>>
>> One XML document D contains a full population for such a data set
>> (O(1MB)). A second XML document P contains "patches", i.e., each node
>> appearing in P is expected to be in D as well.
>>
>> If S(P) is the sequence of nodes (annotated with their XPaths) in P
>> and S(D) the one with nodes from D, how can I determine S(P) intersect
>> S(D) (except all @_ix, whose values are bound to be identical)? Of
>> course, I don't want the common set of *data items* - I want the XML
>> paths of those common data items.
>>
>> A solution (in XSLT 2.0) should not need individual adaption for each
>> kind of data set.
>>
>> I'm confident that I can create text files for D and P containing one
>> line <path> <value> for each node and run diff (after sort).
>>
>> Any better ideas?
>>
>> Cheers
>> Wolfgang

Current Thread