Re: [xsl] Joining two XML-files, can be a O(n)?

Subject: Re: [xsl] Joining two XML-files, can be a O(n)?
From: "Jiang Xin" <worldhello.net@xxxxxxxxx>
Date: Mon, 12 Mar 2007 11:49:17 +0800
thanks Michael.
After trying and trying, I got it at last.

========== new.xslt ==========
<xsl:stylesheet version="1.0"
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
<!-- Usage:
   xsltproc -stringparam mmx_file mindmap.mmx <this_xslt> mindmap.mm
-->
   <xsl:output method="xml" version="1.0" encoding="utf-8"
       indent="yes" />

   <xsl:param name="mmx_file" />
   <xsl:variable name="indexfile" select="document($mmx_file)" />

<xsl:key name="node-by-id" match="node" use="@ID"/>

   <xsl:template match="map">
       <map>
           <xsl:copy-of select="@*" />
           <xsl:apply-templates />
       </map>
   </xsl:template>

   <xsl:template match="node">
       <xsl:variable name="id" select="@ID" />
       <xsl:copy>
           <xsl:copy-of select="@*" />
           <xsl:for-each select="$indexfile">
               <xsl:copy-of select="key('node-by-id', $id)/@*" />
           </xsl:for-each>
           <xsl:apply-templates />
       </xsl:copy>
   </xsl:template>

   <xsl:template match="*">
     <xsl:copy-of select="."/>
   </xsl:template>

</xsl:stylesheet>

========== xsltproc test  ==========
# time  xsltproc  --stringparam mmx_file x.mmx  new.xslt  x.mm > /dev/null

real    0m1.043s
user    0m0.996s
sys     0m0.036s

1 second vs. 8 minutes.

Great.

Jiang Xin
http://www.worldhello.net


2007/3/11, Michael Kay <mike@xxxxxxxxxxxx>:
Saxon-SA will optimize this kind of join for you automatically.
Alternatively, you can do it by hand using xsl:key and the key() function.

Michael Kay
http://www.saxonica.com/


> -----Original Message----- > From: Jiang Xin [mailto:worldhello.net@xxxxxxxxx] > Sent: 10 March 2007 17:55 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: [xsl] Joining two XML-files, can be a O(n)? > > I write a xslt to join two xml files a year ago. But I can > not stand with its low performance. > So I ask for help here. > > It was a hack to FreeMind. > If you like to know what mmx_file is, you can follow the > following URLs: > * > http://freemind.sourceforge.net/wiki/index.php/User:Jiangxin/P > atch_save_extra_attributes_outof_mmfile > * > http://freemind.sourceforge.net/wiki/index.php/User:Jiangxin/P > atch_load_mm_file_with_mmx_file > > ========== XSLT file: join.xslt ========== <xsl:stylesheet > version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";> > <xsl:output method="xml" version="1.0" encoding="utf-8" > indent="yes" /> > > <xsl:param name="mmx_file" /> > > <xsl:template match="map"> > <map> > <xsl:copy-of select="@*" /> > <xsl:apply-templates /> > </map> > </xsl:template> > > <xsl:template match="node"> > <xsl:param name="mmx_node" select="document($mmx_file)" /> > <xsl:copy> > <xsl:copy-of select="@*" /> > <xsl:copy-of > select="$mmx_node//node[@ID=current()/@ID]/@*" /> > <xsl:apply-templates /> > </xsl:copy> > </xsl:template> > > <xsl:template match="*"> > <xsl:copy-of select="."/> > </xsl:template> > > </xsl:stylesheet> > > ========== XML file 1: x.mm ========== > <?xml version="1.0" encoding="UTF-8"?> > <map version="0.9.0_Beta_8"> > <node ID="Freemind_Link_1439916855" TEXT="something"> <node > FOLDED="true" ID="_" POSITION="right" TEXT="..."> <node > ID="Freemind_Link_1446446787" TEXT="..."/> <node > ID="Freemind_Link_1864715670" TEXT="..."/> </node> </node> ... ... > another 8000 nodes! > ... ... > </map> > > ========== XML file 2: x.mmx ========== <?xml version="1.0" > encoding="UTF-8"?> <map version="0.9.0_Beta_8"> <node > CREATED="1173523728454" ID="Freemind_Link_1439916855" > MODIFIED="1173523728454"> > <node FOLDED="FALSE" CREATED="1173523728455" ID="_" > MODIFIED="1173523881485"> <node CREATED="1173523728456" > ID="Freemind_Link_1446446787" > MODIFIED="1173523888376"/> > <node CREATED="1173523728457" ID="Freemind_Link_1864715670" > MODIFIED="1173523894471"/> > </node> > <node CREATED="1173523728458" ID="Freemind_Link_1476641610" > MODIFIED="1173523728458"/> > </node> > ... ... > another 8000 nodes! > ... ... > </map> > > ========== xsltproc test ========== > when operate on large XML file(contain more then 8000 nodes), > it will cost 8 minites!!! > # time xsltproc --stringparam mmx_file x.mmx join.xslt > x.mm > /dev/null > > real 8m7.242s > user 7m48.237s > sys 0m0.084s > > ========== O(n^2) ========== > I know the problem is the process is a o(n^2). > <xsl:param name="mmx_node" select="document($mmx_file)" /> > > I want to know whether there is a solution in xslt scope? > > Thanks.

Current Thread