Subject: RE: [xsl] Combining lists without duplication|
Date: Fri, 28 Sep 2007 16:41:50 -0400
> I guess there is a node-set that consists of all the subdiv > elements that have nt="V" and a ufi attribute whose value is equal to > the bgn-standard name's ufi. But I don't know how to compare the > iso-name against the whole group of them (as opposed to individually > using for-each). It's very late in my workday, and I don't have the energy to work out a solution for you in detail, but here is an example of how you can match for values in a list without using for-each. This requires XSLT 2.0 <?xml version="1.0"?> <fruit> <item>apple</item> <item>grape</item> <item>peach</item> <item>pear</item> <item>plum</item> <item>raspberry</item> </fruit> <?xml version="1.0"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:strip-space elements="*" /> <xsl:output method="text" indent="yes" encoding="UTF-8" /> <xsl:variable name="fruit" select="'plum','peach','banana'"/> <xsl:template match="/"> <xsl:apply-templates /> </xsl:template> <xsl:template match="fruit"> <xsl:apply-templates select="item[.=$fruit]" /> </xsl:template> <xsl:template match="item"> <xsl:copy-of select="concat(.,'
')" /> </xsl:template> </xsl:stylesheet> Your output will be: peach plum -- Charles Knell cknell@xxxxxxxxxx - email -----Original Message----- From: Roger Sperberg <rsperberg@xxxxxxxxx> Sent: Fri, 28 Sep 2007 13:10:57 -0700 (PDT) To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Subject: [xsl] Combining lists without duplication I've assembled a list of country subdivisions and I'm wanting to combine two separate sources of names with this list without duplicating the names. I'm confused as to how best to go about it. The list I've got is an amalgamation from several sources and does contain some subdivisions not included in the listings from ISO or BGN (the U.S. Board of Geographic Names). I've concluded, however, that names from these sources should be utilized whenever possible. I've combined the main list and the ISO list so that each entry contains a section along the following lines. There may or may not be a second basename element, with one or more iso-names: <subdiv fips="AF13"> <basename> <name1>Kabul</name1> <name2>Kaboul</name2> <name3>Kabul</name3> <name4>Kabol</name4> </basename> <basename> <iso-name>Kabul</iso-name> <iso-name2>Kabol</iso-name2> </basename> </subdiv> An entry in the separate BGN-names file includes information indicating whether it is the preferred name (nt="N") or a variant (nt="V"). Each entry has a unique id for the name (uni) and a unique id for the subdivision (ufi) that's shared among the variant names for that subdivision. Preferred names often include a short form. A form of the name is also included that removes all accents and diacritics (bgn-name-nd). Here are the four entries in that file for the subdivision cited above: <subdiv ufi="-3378436" uni="-4801481" fips="AF13" nt="N" short-name="Kabol" bgn-name="Velayat-e Kabol" bgn-name-nd="Velayat-e Kabol" /> <subdiv ufi="-3378436" uni="-4801502" fips="AF13" nt="V" bgn-name="Velayat-e Kabul" bgn-name-nd="Velayat-e Kabul" /> <subdiv ufi="-3378436" uni="-4801510" fips="AF13" nt="V" bgn-name="Kabul Province" bgn-name-nd="Kabul Province" /> <subdiv ufi="-3378436" uni="523049" fips="AF13" nt="V" bgn-name="Kabol" bgn-name-nd="Kabol" /> The result I'd like would - use the BGN preferred name's short form, if there is one, as the subdivision name - if not, use the bgn-name - include the bgn-name and the accent-and-diacritic-free form All the other names -- BGN variants, ISO names and/or variants, and names collected from general sources should be collected in an other-names element, with duplicates excluded. In many instances, BGN includes a variant that matches the short form of the BGN standard name. I'd like to exclude that. I'd like to exclude any ISO or generally collected name that matches the accent-and-diacritic-free form of the preferred name. And, obviously, I'd like to exclude any ISO name that duplicates the BGN preferred name or any BGN variant, and exclude any generally collected name that duplicates a BGN or ISO name. The result for Kabol would be: <subdiv fips="AF13"> <basename> <name>Kabol</name> <long-form>Velayat-e Kabol</long-form> <long-form-nd>Velayat-e Kabol</long-form-nd> </basename> <other-names> <bgn-variant>Velayat-e Kabul</bgn-variant> <bgn-variant>Kabul Province</bgn-variant> <iso-name>Kabul</iso-name> <alt-name>Kabul</alt-name> <alt-name>Kaboul</alt-name> <alt-name>Kabol</alt-name> </other-names> </subdiv> Whenever no BGN entry exists, I want to use the first ISO entry for the name, with all other unique names put into the other-names wrapper. * * * When I started working out the XSLT, I began by testing to see if a BGN name existed. If so, I would use the short form if available, and then add the variants, testing to see if any of them were the same as @bgn-name-nd. This would handle 75 to 90 percent of the subdivisions. Shortly after that point, my understanding of the correct approach began to crumble. If an ISO name exists also, I can easily check it against the BGN standard name and bgn-name-nd, but I'm not sure what the test looks like against the BGN variants, if there are any. I don't see any way to use for-each to test against each variant. Nor can I figure out how to rely on choose/when/otherwise without knowing how many variants there are. I guess there is a node-set that consists of all the subdiv elements that have nt="V" and a ufi attribute whose value is equal to the bgn-standard name's ufi. But I don't know how to compare the iso-name against the whole group of them (as opposed to individually using for-each). And then when I have added iso-names, how do I compare each generally collected name against the BGN and ISO names? It must be the same process, but now I'm getting a pretty complicated set. Guidance, please? I tried searching the list archives, but (a) I'm not sure how to term what I'm looking for and (b) I wasn't sure that what I found actually applied. Just pointing me to the right section in a reference would be very welcome. I'm transforming the file using Saxon B 8.9 and XSLT 2.0 so I can use the third parameter with key(). Thanks. Roger Sperberg A not-too-frequent XSLT-er Montclair, NJ -- Cambodian Language Exercises -- cambodian.tiddlyspot.com Beginning Cambodian Reader -- cambodian-reader.tiddlyspot.com