Subject: Re: [xsl] combine xml files From: "Thomas B. Passin" <tpassin@xxxxxxxxxxxx> Date: Thu, 11 Apr 2002 15:28:32 -0400 |
[Ming] > Hi, Tom, > > You are really good! All your assumptions are correct. > OK, then. This will be a bit long, but it's not complicated. I'm going to leave the task of iterating over the individual files to you - you already have received some suggestions - and just give you one solution to getting titles and names from a single xml file according to their respective db priorities. I won't claim that this is the most efficient stylesheet. I'm sure others on the list, like Jeni or Mike Kay, can come up with a more efficient approach. I'm after simplicity to give you a starting point that you can easily modify to suit your needs. First, I made a few modifications to your xml file - I quoted the attributes to make it well-formed, and I added "db2" to the titles for db2 so we can tell if our preferences are being followed when we look at the output. I also changed the name of the root element from "xml" to "record". Here is the resulting source file: ========= Source XML file ========= <record> <db1> <jauthor> <author db="db1"> Smith, J</author> <author db="db1"> Mou, S </author> </jauthor> <jtitle> <title db="db1"> Preliminary study on network (II)(db1) </title> </jtitle> </db1> <db2> <jauthor> <author db="db2"> Smith, JR </author> <author db="db2"> Mou, ST </author> </jauthor> <jtitle> <title db="db2"> Preliminary Study on Network (II)(db2) </title </jtitle> </db2> </record> ==================================== Next, I created an xml file for the db preferences. The priorities are to be applied in their document order: ========= File db_prefs.xml =========== <dbprefs> <titles> <pref name='db2'/> <pref name='db1'/> <pref name='db3'/> </titles> <authors> <pref name='db1'/> <pref name='db2'/> <pref name='db3'/> </authors> </dbprefs> ================================== Notice that I used different priorities for titles and authors, and I included a third db to demonstrate that we don't return results for a db for which we have no data. Here is the stylesheet, part by part with comments: ======== Stylesheet =============== <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <!-- Variable so we can refer to the source document --> <xsl:variable name='record' select='/record'/> <!-- Variables for our priorities, gotten from the prefs file --> <xsl:variable name='title-prefs' select='document("db_prefs.xml")/dbprefs/titles/pref'/> <xsl:variable name='author-prefs' select='document("db_prefs.xml")/dbprefs/authors/pref'/> <!--================================ It's more readable to define the -pref variables here than simply to use them inline where we need them. ===================================--> <!--================================= Get the title first, then the authors. We do them separately so we can apply the appropriate priorities to each. This also makes it easy to deal with the fact that there can be several authors for a work (since we don't intend to pull one author from db1 and another from db2, for example). =================================== --> <xsl:template match="/record"> <results> <title> <xsl:call-template name='get-title'/> </title> <authors> <xsl:call-template name='get-authors'/> </authors> </results> </xsl:template> <!--================================== The key point here is to get the titles in order of their db priority. Then the first one will automatically have the highest priority. How can we separate it out from the other possible titles from other dbs? My approach is a bit of a hack, but simple. I return a single string with all the titles concatenated with \\\ between them. I get the first one by using substring-before(). Of course, using \\\ is arbitrary, any separator would do that isn't going to show up in the titles (as I said, a bit of a hack but it makes things nice and simple). This is easier than creating machinery to continue iterating through all the titles only if we have not already found one. I doubt that the extra time to iterate through them all is enough to be harmful, considering that we can avoid testing on each iteration, but this point could be tested. ========================================--> <xsl:template name='get-title'> <xsl:variable name='title-results'> <!-- ============================ Here is where we apply the priorities. We use xsl:for-each to go through them in order ==============================--> <xsl:for-each select='$title-prefs/@name'> <xsl:variable name='db' select='.'/> <!--================================= Here we get all the titles, regardless of which db they are grouped with. If you didn't use the "db" attribute we'd have to change the approach a bit to carry the db information along. The way you have done it makes this easier to do. ==================================== --> <xsl:variable name='title' select='$record/*/jtitle/title[@db=$db]'/> <xsl:if test='$title'><xsl:value-of select='concat($title,"\\\")'/></xsl:if> </xsl:for-each> </xsl:variable> <!--==================================== The hack exposed! ======================================--> <xsl:value-of select='substring-before($title-results,"\\\")'/> </xsl:template> <!--=============================== I treat the authors the same way, but it's a bit harder because there may be more than one author and you may want to apply some formatting between their names. Here, I just insert two non-breaking spaces between the names. Otherwise, they are handled just like the titles. In particular, the authors, all of them, are returned as a single string. If you need to break them out into separate elements, you may have to convert them to a node-set so you can return just the first one (e.g., authors[1]). If so, you have to make sure to use an xslt processor that has an convert-to-node-set extension. ======================================--> <xsl:template name='get-authors'> <xsl:variable name='author-results'> <xsl:for-each select='$author-prefs/@name'> <xsl:variable name='db' select='.'/> <xsl:variable name='authors' select='$record/*/jauthor/author[@db=$db]'/> <xsl:if test='$authors'> <xsl:for-each select='$authors'> <xsl:value-of select='.'/>   </xsl:for-each>\\\ </xsl:if> </xsl:for-each> </xsl:variable> <xsl:value-of select='substring-before($author-results,"\\\")'/> </xsl:template> <!--==================================--> </xsl:stylesheet> =========================================== And here are the results, with some whitespace changed for visual formatting: <results> <title> Preliminary Study on Network (II)(db2) </title> <authors> Smith, J Mou, S </authors> /results> You see that we got the title from db2 and the authors from db1, as required by the priorities in the db_prefs.xml file. In practice, you will either want to list all the files in a driver file and run them through the stylesheet in one invocation, or you will want to compile the stylesheet and keep it it memory so that it does not have to be rebuilt for each xml file. You don't want to invoke the stylesheet separately for each file since that would take a long time, considering that you may have a lot of files to process. This method will not work if you concatenate all the separate xml files, since it relys on getting a single result from a single file. But it may give you ideas for handling a concatenated file if you end up wanting to try that out. I don't think you will need to. I suggest that you use a very simple output format, like this one, while you are developing the method for processing all the files. Once everything works, and is fast enough, you can tune up the stylesheet to produce HTML or whatever you want. Keep it as simple as possible for as long as possible. Cheers, Tom P XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] combine xml files, Ming | Thread | [xsl] xsl as a hacker's tool, Bryan Rasmussen |
Re: [xsl] variable in xpath?, Oleg Tkachenko | Date | Re: [xsl] Indent based on position(, Kirk Allen Evans |
Month |