Re: [xsl] Find inconsistencies: Perl or XSLT?

Subject: Re: [xsl] Find inconsistencies: Perl or XSLT?
From: Hermann Stamm-Wilbrandt <STAMMW@xxxxxxxxxx>
Date: Wed, 1 Dec 2010 19:01:22 +0100
Perhaps I am missing something here, but for this simple problem XSLT 1.0
end even XPATH 1.0 seems to be good enough.


Problem:
identify duplicate source entries of unit elements


Input tags did not match, find corrected input.xml below.


If input file size is moderate this simple XPATH statement will do it:

$ xpath++ "/data/unit[source=following-sibling::unit/source]" input.xml

===============================================================================
<unit id="1">
    <source>blabla</source>
    <target>plapla</target>
</unit>
===============================================================================
<unit id="2">
    <source>bleble</source>
    <target>pleple</target>
</unit>
$


Now in case of bigger files to process making use of key() function helps:

$ cat dupsrc.xsl
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>
  <xsl:key name="source" match="node()" use="source"/>

  <xsl:template match="text()"/>

  <xsl:template match="/data/unit[count(key('source',source))>1]">
    <xsl:value-of select="concat(@id,'-',source,'&#10;')"/>
  </xsl:template>

</xsl:stylesheet>
$
$ xsltproc dupsrc.xsl input.xml
<?xml version="1.0"?>
1-blabla
2-bleble
4-blabla
5-bleble

$ cat input.xml
<data>
<unit id="1">
    <source>blabla</source>
    <target>plapla</target>
</unit>
<unit id="2">
    <source>bleble</source>
    <target>pleple</target>
</unit>
<unit id="3">
    <source>bloblo</source>
    <target>ploplo</target>
</unit>
<unit id="4">
    <source>blabla</source>
    <target>plapla</target>
</unit>
<unit id="5">
    <source>bleble</source>
    <target>lolailo</target>
</unit>
</data>
$


Mit besten Gruessen / Best wishes,

Hermann Stamm-Wilbrandt
Developer, XML Compiler, L3
Fixpack team lead
WebSphere DataPower SOA Appliances
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294



From:       Michael Kay <mike@xxxxxxxxxxxx>
To:         xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Date:       12/01/2010 04:06 PM
Subject:    Re: [xsl] Find inconsistencies: Perl or XSLT?



On 01/12/2010 14:46, Manuel Souto Pico wrote:
> Dear all,
>
> I need to process some files and I know how to do it in Perl, but as
> has happened to be the case in the past with other stuff, perhaps
> there's a (objectively) simpler or more efficient way to do it with
> XSLT.
>
> I have a file like this
>
> <unit id="1">
>     <source>blabla</source>
>     <target>plapla</source>
> </unit>
> <unit id="2">
>     <source>bleble</source>
>     <target>pleple</source>
> </unit>
> <unit id="3">
>     <source>bloblo</source>
>     <target>ploplo</source>
> </unit>
> <unit id="4">
>     <source>blabla</source>
>     <target>plapla</source>
> </unit>
> <unit id="5">
>     <source>bleble</source>
>     <target>lolailo</source>
> </unit>
>
> I think the example is illustrative enough.
>
> The target element contains the translation of the source element, and
> one same element must always be translated in the same way, but
> sometimes it's not. So what I'd to do is find two or more units with
> the same source but with different target (like 2 and 5 in the
> example, but unlike 1 and 4).
>
> In Perl I would use a XML module (or not) and put the source elements
> in the keys of a hash and the target elements in their corresponding
> values. When assigning a new key-value pair, if the key already
> exists, I compare the values. If they are equal, they pass, else they
> are flagged and included in the report.
>
> The report in this case would be something like:
>
> The following inconsitencies have been found
> 2: bleble ->  pleple
> 5: bleble ->  lolailo
>
> Is it possible to do this in XSLT? Is it more efficient that doing it
> in Perl as I was planning to? I knowledge of XSLT is very limited and
> I can't see beyond transforming a XML file into another XML file.
>
> Thanks a lot for your opinion.
> Manuel
>
>
Something like this:

<xsl:for-each-group select="unit" group-by="source">
<xsl:if test="count(distinct-values(current-group()/target)) gt 1">
<conflicts-for source="{current-grouping-key()}">
<xsl:value-of select="distinct-values(current-group()/target)"/>
</conflicts>
</xsl:if>
</xsl:for-each-group>

Michael Kay
Saxonica

Current Thread