RE: [xsl] Joining two XML-files, can be a O(n)?

Subject: RE: [xsl] Joining two XML-files, can be a O(n)?
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Sun, 11 Mar 2007 09:09:27 -0000
Saxon-SA will optimize this kind of join for you automatically.
Alternatively, you can do it by hand using xsl:key and the key() function.

Michael Kay
http://www.saxonica.com/ 


> -----Original Message-----
> From: Jiang Xin [mailto:worldhello.net@xxxxxxxxx] 
> Sent: 10 March 2007 17:55
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] Joining two XML-files, can be a O(n)?
> 
> I write a xslt to join two xml files a year ago. But I can 
> not stand with its low performance.
> So I ask for help here.
> 
> It was a hack to FreeMind.
> If you like to know what mmx_file is, you can follow the 
> following URLs:
>     * 
> http://freemind.sourceforge.net/wiki/index.php/User:Jiangxin/P
> atch_save_extra_attributes_outof_mmfile
>     * 
> http://freemind.sourceforge.net/wiki/index.php/User:Jiangxin/P
> atch_load_mm_file_with_mmx_file
> 
> ========== XSLT file: join.xslt ========== <xsl:stylesheet 
> version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
>     <xsl:output method="xml" version="1.0" encoding="utf-8" 
> indent="yes" />
> 
>     <xsl:param name="mmx_file" />
> 
>     <xsl:template match="map">
>         <map>
>             <xsl:copy-of select="@*" />
>             <xsl:apply-templates />
>         </map>
>     </xsl:template>
> 
>     <xsl:template match="node">
>         <xsl:param name="mmx_node" select="document($mmx_file)" />
>         <xsl:copy>
>             <xsl:copy-of select="@*" />
>             <xsl:copy-of  
> select="$mmx_node//node[@ID=current()/@ID]/@*" />
>             <xsl:apply-templates />
>         </xsl:copy>
>     </xsl:template>
> 
>     <xsl:template match="*">
>       <xsl:copy-of select="."/>
>     </xsl:template>
> 
> </xsl:stylesheet>
> 
> ========== XML file 1: x.mm ==========
> <?xml version="1.0" encoding="UTF-8"?>
> <map version="0.9.0_Beta_8">
> <node ID="Freemind_Link_1439916855" TEXT="something"> <node 
> FOLDED="true" ID="_" POSITION="right" TEXT="..."> <node 
> ID="Freemind_Link_1446446787" TEXT="..."/> <node 
> ID="Freemind_Link_1864715670" TEXT="..."/> </node> </node> ... ...
> another 8000 nodes!
> ... ...
> </map>
> 
> ========== XML file 2: x.mmx  ========== <?xml version="1.0" 
> encoding="UTF-8"?> <map version="0.9.0_Beta_8"> <node 
> CREATED="1173523728454" ID="Freemind_Link_1439916855"
> MODIFIED="1173523728454">
> <node FOLDED="FALSE" CREATED="1173523728455" ID="_" 
> MODIFIED="1173523881485"> <node CREATED="1173523728456" 
> ID="Freemind_Link_1446446787"
> MODIFIED="1173523888376"/>
> <node CREATED="1173523728457" ID="Freemind_Link_1864715670"
> MODIFIED="1173523894471"/>
> </node>
> <node CREATED="1173523728458" ID="Freemind_Link_1476641610"
> MODIFIED="1173523728458"/>
> </node>
> ... ...
> another 8000 nodes!
> ... ...
> </map>
> 
> ========== xsltproc test  ==========
> when operate on large XML file(contain more then 8000 nodes), 
> it will cost 8 minites!!!
> # time  xsltproc  --stringparam mmx_file x.mmx  join.xslt  
> x.mm > /dev/null
> 
> real    8m7.242s
> user    7m48.237s
> sys     0m0.084s
> 
> ========== O(n^2) ==========
> I know the problem is the process is a o(n^2).
>     <xsl:param name="mmx_node" select="document($mmx_file)" />
> 
> I want to know whether there is a solution in xslt scope?
> 
> Thanks.

Current Thread