Subject: [xsl] XML to XML change, handling mixed content From: Karlmarx R <karlmarxr@xxxxxxxxx> Date: Wed, 19 Oct 2011 04:41:17 +0800 (SGT) |
Hello, I have 2 questions: 1) I have a specific requirement where I am bit struck with what would be the best way to handle it. In a nutshell, I need to modify the source <p> text text ‘ LINK-1 TEXT ’ TEXT TEXT <URL weburl="XXX">XXX</url> TEXT <SOmething>TEXT</SOmething> AND again <INSIDE>SOME TEXT text ‘ LINK-2 TEXT ’ TEXT <URL weburl="YYY">YYY</url></INSIDE> And can be more text with or without URL and TEXT like ‘ LINK-3 TEXT’ </p> to (THE REQUIREMENT) <p> text text <a href="XXX"> LINK-1 TEXT </a> TEXT TEXT TEXT <SOmething>TEXT</SOmething> AND again <another>SOME TEXT text <a href="XXX"> LINK-2 TEXT </a> TEXT <another> And can be more text with or without URL and TEXT like ‘ LINK-3 TEXT’ </p> What it required is, for each <URL>, if the PRECEDING part of string had text contained within ‘ and ’, then they mut be converted to <a href> link. For me, after narrowing down to p[URL], not sure what would be the best pattern to achieve the desired result. Pls can you suggest something? In the above sample, NOTE that the last set of ‘ LINK-3 TEXT’ was left as it is due to no matching URL. Even though XSL1 used, if XSL2 can solve it easily, pls suggest that also. [SAMPLE Skeleton XML and XSL] XML: <?xml version="1.0" encoding="UTF-8"?> <root> <something> <blah-blah>Can have many child</blah-blah> <nodeGroup> <note id="does-not-matter-1"> <p> <something><sup>1</sup></something> some text here. <bidItem id="95522-1" vol="1"> Title Name, Other details, ‘The arms trade and corruption’, <i>Prospect</i> Aug.2005</bidItem>. <!-- NOTE: NO URL IN THIS CASE, WHICH IS FINE --> </p> </note> <note id="does-not-matter-2"> <p> some text ‘Ex-Pentagon procurement executive gets jail time’, text text < <url webUrl="http://www.aaa.xx/bbb/ddd.htm">http://www.aaa.xx/bbb/ddd.htm</url>> ;; ‘Former Air Force acquisition official released from jail’, Government in 2005, < <url webUrl="http://www.aaa.xx/bbb/uuu.htm">SAME AS @webUrl</url>>; and <bidItem id="95522-2">Author name., ‘Cashing in for profit? Who cost taxpayers billions in biggest Pentagon scandal in years?’, <i>60 Minutes</i>, CBS, 5 Jan. 2005 </bidItem>, < <url webUrl="http://www.cbsnews.com/stories/2005/01/04/60II/main664652.shtml">SAME AS @webUrl</url>>. <!-- HERE EACH URL HAS MATCHING ‘contens’ WHICH IS FINE --> </p> </note> <note id="does-not-matter-3"> <p><something><sup>68</sup></something> This figure is comprised of a fine of £500 000 ($900 000) for ‘irregular accounting practices’ in a Tanzanian deal for an inappropriate and overpriced air radar system that was tainted by allegations of high-level corruption, with ...($405 000) costs.. £29.275 million ($52.695 million) going to Tanzania in reparations. <bidItem id="996522-31" title="BAE deal with Tanzania...">Evans, R. and Lewis, P., ‘BAE deal with Tanzania: military air traffic control—for country with no airforce’, <i>The Guardian</i>, 6 Feb. 2010</bidItem>; ‘Military radar probe: the key suspects … and the case against them’, <i>This Day</i> (Dar es Salaam), 15 Feb. 2010; < <url webUrl="http://www.judiciary.gov.uk/Resources/JCO/Documents/Judgments/r-v-bae -sentencing-remarks.pdf">SAME AS @webUrl</url>>. <!-- ONLY ONE URL, BUT MANY ‘ in-between texts ’ So, the URL belong only to its preceding "‘ in-between texts ’" --> </p> </note> </nodeGroup> </something> XSL: <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="/"> <xsl:apply-templates select="*"/> </xsl:template> <xsl:template match="*"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> <xsl:variable name="href-start"><href="</xsl:variable> <xsl:variable name="href-mid">"/></xsl:variable> <xsl:variable name="href-finish"><a/></xsl:variable> <xsl:template match="note"> <xsl:copy> <xsl:apply-templates select="@*"/> <xsl:apply-templates mode="url"/> </xsl:copy> </xsl:template> <xsl:template match="p[url]" mode="url"> <!-- HERE, FOR EACH URL, IT SHOULD FORM A HREF LINK, COVERING ANY PRECEDING TEXT THAT APPEAR IN-BETWEEN ‘ AND ’ Ref: MAIL DESCRIPTION. --> <xsl:copy> <xsl:apply-templates select="@*"/> <xsl:apply-templates/> </xsl:copy> </xsl:template> <xsl:template match="p[not(url)]" mode="url"> <xsl:copy> <xsl:apply-templates select="@*"/> <xsl:apply-templates/> </xsl:copy> </xsl:template> <xsl:template match="@*|text()|comment()|processing-instruction()"> <xsl:copy-of select="."/> </xsl:template> <!-- COMMENTED... SOME TRY ALONG THIS LINE <xsl:template .... mode="url"> <xsl:copy> <xsl:... test="contains(., '‘')"> <!-\-<xsl:apply-templates> <xsl:sort select="substring-before(., '‘')"/> </xsl:apply-templates>-\-> <xsl:value-of select="substring-before(., '‘')"/> <xsl:value-of select="$href-start" disable-output-escaping="yes"/>[@<xsl:value-of select="following-sibling::url"/>]<xsl:value-of select="$href-mid" disable-output-escaping="yes"/> <xsl:value-of select="substring-after(., '‘')"/> </xsl:...> <xsl:... test="contains(., '’')"> <xsl:value-of select="substring-before(., '’')"/> <xsl:value-of select="$href-finish" disable-output-escaping="yes"/> <xsl:value-of select="substring-after(., '’')"/> </xsl:..> <xsl:apply-templates .... mode="url"/> </xsl:copy> </xsl:template> --> </xsl:stylesheet> 2) Additionally, when dealing with such mixed content (I mean containing both text and child elements), what is the best way to split and handle elements and text seperately? Thanks and look forward to suggestions, Karl
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] required parameter, Michael Kay | Thread | Re: [xsl] XML to XML change, handli, Ruud Grosmann |
Re: [xsl] required parameter, Michael Kay | Date | Re: [xsl] XML to XML change, handli, Ruud Grosmann |
Month |