|
Subject: Re: [xsl] Comparing documents: what of P is a subset of D? From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx> Date: Fri, 28 Feb 2014 11:57:03 +0100 |
@Michael: your answer triggered a thought process that outlined the way to
a solution I'm able to implement. I don't know whether this is of any
interest to
others, but it's a nice little exercise for a training, illustrating mode, key,
another input document.
Problem:
Given two XML files according to the same XML schema, find all leave
nodes (text() and @*) in one document ("Patch") that have an identical
value at the same iXPath
in the other document ("Data"), where an iXPath is an XPath using
element, attribute names and predicates [@_ix eq n] wherever they
occur (in repeating elements).
Solution outline:
Process the Patch document, creating a set of nodes <p2v @path @value>
mapping iXPaths to values, with a key based on @path. Then, process
the Data document analoguously, looking up iXPaths in the key and
comparing values, where found.
Below is the code, very likely not perfect ;-)
(Note that the output would be much more readable if an iXPath could
be truncated at a point where the subtree is identical in the defined
way.)
Thanks
W
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:wl="http://members.inode.at/w.laun"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" />
<xsl:strip-space elements = '*' />
<xsl:param name="patchfile" as="xs:string"/>
<xsl:variable name="patch" select="document($patchfile)" />
<xsl:key name = "path2value" match = "p2v" use = "@path"/>
<!-- pass over patch file -->
<xsl:variable name="map" as="document-node()">
<xsl:document>
<map>
<xsl:for-each select = "$patch">
<xsl:apply-templates select = "*" mode="indexing">
<xsl:with-param name = "path" select = "''" />
</xsl:apply-templates>
</xsl:for-each>
</map>
</xsl:document>
</xsl:variable>
<xsl:template match="*" mode="indexing">
<xsl:param name = "path" as = "xs:string" />
<xsl:apply-templates select = "*|@*|text()" mode="indexing">
<xsl:with-param name = "path" select = "concat( $path, '/',
local-name() )"/>
</xsl:apply-templates>
</xsl:template>
<xsl:template match="*[@_ix]" mode="indexing">
<xsl:param name = "path" as = "xs:string" />
<xsl:apply-templates select = "*|@*|text()" mode="indexing">
<xsl:with-param name = "path"
select = "concat( $path, '/', local-name(), '[',
@_ix, ']' )"/>
</xsl:apply-templates>
</xsl:template>
<xsl:template match="@*" mode="indexing">
<xsl:param name = "path" as = "xs:string" />
<xsl:variable name = "fp" select = "concat( $path, '/', local-name() )"/>
<p2v path = "{$fp}" value = "{.}"/>
</xsl:template>
<xsl:template match="@_ix" mode="indexing"/>
<xsl:template match="text()" mode="indexing">
<xsl:param name = "path" as = "xs:string" />
<p2v path = "{$path}" value = "{.}"/>
</xsl:template>
<!-- Pass over DB data file -->
<xsl:template match = "/">
<xsl:apply-templates mode="comparing">
<xsl:with-param name = "path" select = "''" />
</xsl:apply-templates>
</xsl:template>
<xsl:template match="*" mode="comparing">
<xsl:param name = "path" as = "xs:string" />
<xsl:apply-templates select = "*|@*|text()" mode="comparing">
<xsl:with-param name = "path"
select = "concat( $path, '/', local-name() )"/>
</xsl:apply-templates>
</xsl:template>
<xsl:template match="*[@_ix]" mode="comparing">
<xsl:param name = "path" as = "xs:string" />
<xsl:apply-templates select = "*|@*|text()" mode="comparing">
<xsl:with-param name = "path"
select = "concat( $path, '/', local-name(), '[',
@_ix, ']' )"/>
</xsl:apply-templates>
</xsl:template>
<xsl:template match="@*" mode="comparing">
<xsl:param name = "path" as = "xs:string" />
<xsl:variable name = "fp" select = "concat( $path, '/', local-name() )"/>
<xsl:variable name = "pval" select = "key( 'path2value', $fp,
$map/map )/@value"/>
<xsl:if test = "$pval eq .">
<xsl:value-of select = "concat( $fp, ' ... ', $pval)"/><xsl:text>
</xsl:text>
</xsl:if>
</xsl:template>
<xsl:template match="@_ix" mode="comparing"/>
<xsl:template match="text()" mode="comparing">
<xsl:param name = "path" as = "xs:string" />
<xsl:variable name = "pval" select = "key( 'path2value', $path,
$map/map )/@value"/>
<xsl:if test = "$pval eq .">
<xsl:value-of select = "concat( $path, ' ... ', $pval)"/><xsl:text>
</xsl:text>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
On 27/02/2014, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> I'm not sure I've completely understood your "equality" relation that
> underpins the intersection. Perhaps it's based on equality of the function
>
> string-join(ancestor-or-self::*/@_ix, '|')
>
> let's call this function $f, and we can use this as a parameter to the rest
> of the solution.
>
> we then need to do
>
> doc('d.xml')//fc[some $e in doc('p.xml') satisfies $f($e) eq $f(.)] !
> path(.)
>
> where path(.) is a function you can write to display the path to the
> selected fc element.
>
> The only remaining problem is that this is O(n*m) where n and m are the
> sizes of D and P. For a more efficient solution, define a key on P.XML that
> indexes each element on the value of the function $f, and replace the
> predicate by a call on key().
>
> The above uses XPath 3.0, but it can probably be expressed in XPath 2.0
> easily enough at the cost of hard-coding the equality function.
>
> Michael Kay
> Saxonica
>
>
> On 27 Feb 2014, at 10:25, Wolfgang Laun <wolfgang.laun@xxxxxxxxx> wrote:
>
>> <cca><!-- a D XML -->
>> <rela _ix='0' fa='0' fb='1'>
>> <fc _ix='1' fc_fa='X1' fc_fb='1'/>
>> <fc _ix='2' fc_fa='X2' fc_fb='2'/>
>> </rela>
>> <rela _ix='1' fa='10' fb='11'>
>> <fc _ix='1' fc_fa='Y1' fc_fb='11'/>
>> <fc _ix='2' fc_fa='Y2' fc_fb='12'/>
>> </rela>
>> <rela _ix='5' fa='50' fb='51'>
>> <fc _ix='1' fc_fa='A1' fc_fb='51'/>
>> <fc _ix='2' fc_fa='A2' fc_fb='52'/>
>> </rela>
>> <relb>...</relb>
>> <relc>...</relc>
>> </cca>
>>
>> <cca><!-- a P XML -->
>> <rela _ix='1' fa='10'>
>> <fc _ix='1' fc_fa='Y1' fc_fb='99'/>
>> </rela>
>> <rela _ix='5' fa='50' fb='51'>
>> <fc _ix='1' fc_fb='51' fc_fc='123'/>
>> <fc _ix='2' fc_fa='A2' fc_fb='52' fc_fc='456'/>
>> </rela>
>> </cca>
>>
>> Expected output:
>>
>> /cca/rela(1)/fa 10
>> /cca/rela(1)/fc(1)/fc_fa Y1
>> /cca/rela(5)/fa 50
>> /cca/rela(5)/fa 51
>> /cca/rela(5)/fc(1)/fc_fb 51
>> /cca/rela(5)/fc(2)/fc_fa A2
>> /cca/rela(5)/fc(2)/fc_fb 52
>>
>> Note that parentheses enclose values of @_ix.
>>
>> -W
>>
>> On 27/02/2014, Michael Kay <mike@xxxxxxxxxxxx> wrote:
>>> It would be easier to understand the problem with some example data.
>>>
>>> Michael Kay
>>> Saxonica
>>>
>>> On 27 Feb 2014, at 08:05, Wolfgang Laun <wolfgang.laun@xxxxxxxxx> wrote:
>>>
>>>> The data model for a set of similarly (but not identically) built XML
>>>> documents is: a collection of arrays of records, which may contain
>>>> (recursively) arrays, records and scalars. (The terms "array" and
>>>> "record" are used in their "classic" meaning as, e.g., in Pascal.)
>>>> Document structures are fairly stable, but they do change over time.
>>>> Array elements are identified (indexed) by @_ix, not by position.
>>>> Record fields can be elements or attributes (when they are scalar).
>>>> Order is undefined, since XPaths plus @_Ix's pinpoint each node.
>>>>
>>>> One XML document D contains a full population for such a data set
>>>> (O(1MB)). A second XML document P contains "patches", i.e., each node
>>>> appearing in P is expected to be in D as well.
>>>>
>>>> If S(P) is the sequence of nodes (annotated with their XPaths) in P
>>>> and S(D) the one with nodes from D, how can I determine S(P) intersect
>>>> S(D) (except all @_ix, whose values are bound to be identical)? Of
>>>> course, I don't want the common set of *data items* - I want the XML
>>>> paths of those common data items.
>>>>
>>>> A solution (in XSLT 2.0) should not need individual adaption for each
>>>> kind of data set.
>>>>
>>>> I'm confident that I can create text files for D and P containing one
>>>> line <path> <value> for each node and run diff (after sort).
>>>>
>>>> Any better ideas?
>>>>
>>>> Cheers
>>>> Wolfgang
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] Comparing documents: what, Michael Kay | Thread | [xsl] Unicode characters being repl, dvint |
| Re: [xsl] executable but unreadable, Michael Kay | Date | [xsl] Can someone explain this gene, russurquhart1 |
| Month |