|
Subject: [xsl] XML to XML change, handling mixed content From: Karlmarx R <karlmarxr@xxxxxxxxx> Date: Wed, 19 Oct 2011 04:41:17 +0800 (SGT) |
Hello,
I have 2 questions:
1) I have a specific requirement where I am bit
struck with what would be
the best way to handle it. In a nutshell, I need to
modify the source
<p>
text text ‘ LINK-1 TEXT ’ TEXT TEXT
<URL
weburl="XXX">XXX</url> TEXT
<SOmething>TEXT</SOmething>
AND again
<INSIDE>SOME TEXT text ‘ LINK-2 TEXT ’ TEXT
<URL
weburl="YYY">YYY</url></INSIDE>
And can be more text with or without URL
and TEXT like ‘ LINK-3 TEXT’
</p>
to (THE REQUIREMENT)
<p>
text text <a href="XXX"> LINK-1 TEXT </a> TEXT TEXT TEXT
<SOmething>TEXT</SOmething>
AND again <another>SOME TEXT text <a
href="XXX"> LINK-2 TEXT </a> TEXT <another>
And can be more text with or
without URL and TEXT like ‘ LINK-3 TEXT’
</p>
What
it required
is, for each <URL>, if the PRECEDING part of string
had text contained
within ‘ and ’, then they mut
be converted to <a href> link.
For me, after narrowing down to
p[URL], not sure what would be the best
pattern to achieve the desired
result. Pls can you suggest something? In the
above sample, NOTE that
the last set of ‘ LINK-3 TEXT’ was left
as it is
due to no matching URL. Even though XSL1 used, if XSL2 can solve it
easily, pls suggest that also.
[SAMPLE Skeleton XML and XSL]
XML:
<?xml
version="1.0"
encoding="UTF-8"?>
<root>
<something>
<blah-blah>Can have many child</blah-blah>
<nodeGroup>
<note id="does-not-matter-1">
<p>
<something><sup>1</sup></something>
some text here. <bidItem id="95522-1" vol="1"> Title Name, Other details,
‘The
arms trade and corruption’,
<i>Prospect</i> Aug.2005</bidItem>.
<!-- NOTE: NO URL IN THIS CASE, WHICH IS FINE -->
</p>
</note>
<note id="does-not-matter-2">
<p> some text
‘Ex-Pentagon procurement executive gets jail
time’, text text <
<url
webUrl="http://www.aaa.xx/bbb/ddd.htm">http://www.aaa.xx/bbb/ddd.htm</url>>
;;
‘Former Air Force acquisition official
released from
jail’, Government in 2005, <
<url
webUrl="http://www.aaa.xx/bbb/uuu.htm">SAME AS @webUrl</url>>; and
<bidItem id="95522-2">Author name., ‘Cashing in for
profit? Who cost taxpayers
billions in biggest Pentagon scandal in
years?’, <i>60 Minutes</i>, CBS, 5 Jan. 2005
</bidItem>, < <url
webUrl="http://www.cbsnews.com/stories/2005/01/04/60II/main664652.shtml">SAME
AS @webUrl</url>>.
<!-- HERE EACH
URL HAS MATCHING ‘contens’ WHICH IS FINE -->
</p>
</note>
<note id="does-not-matter-3">
<p><something><sup>68</sup></something> This figure is
comprised of a fine of
£500 000 ($900 000)
for ‘irregular accounting practices’
in a
Tanzanian deal for an inappropriate and overpriced air radar system that was
tainted by allegations of high-level corruption, with
...($405 000)
costs..
£29.275 million ($52.695
million) going to Tanzania in reparations. <bidItem
id="996522-31" title="BAE deal with Tanzania...">Evans, R. and
Lewis, P.,
‘BAE deal with Tanzania:
military air traffic
control—for country with no airforce’, <i>The
Guardian</i>, 6 Feb. 2010</bidItem>; ‘Military
radar probe: the key suspects … and
the case
against them’, <i>This Day</i> (Dar es Salaam), 15 Feb. 2010; <
<url
webUrl="http://www.judiciary.gov.uk/Resources/JCO/Documents/Judgments/r-v-bae
-sentencing-remarks.pdf">SAME
AS @webUrl</url>>.
<!--
ONLY ONE URL, BUT MANY
‘ in-between texts ’
So, the URL belong
only to its preceding "‘ in-between texts ’"
-->
</p>
</note>
</nodeGroup>
</something>
XSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:variable
name="href-start"><href="</xsl:variable>
<xsl:variable
name="href-mid">"/></xsl:variable>
<xsl:variable
name="href-finish"><a/></xsl:variable>
<xsl:template match="note">
<xsl:copy>
<xsl:apply-templates
select="@*"/>
<xsl:apply-templates mode="url"/>
</xsl:copy>
</xsl:template>
<xsl:template
match="p[url]" mode="url">
<!-- HERE, FOR EACH URL, IT SHOULD FORM A
HREF LINK, COVERING ANY PRECEDING TEXT THAT APPEAR
IN-BETWEEN
‘ AND ’
Ref: MAIL
DESCRIPTION.
-->
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="p[not(url)]" mode="url">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template
match="@*|text()|comment()|processing-instruction()">
<xsl:copy-of
select="."/>
</xsl:template>
<!-- COMMENTED... SOME TRY ALONG THIS
LINE
<xsl:template .... mode="url">
<xsl:copy>
<xsl:...
test="contains(., '‘')">
<!-\-<xsl:apply-templates>
<xsl:sort
select="substring-before(., '‘')"/>
</xsl:apply-templates>-\->
<xsl:value-of
select="substring-before(., '‘')"/>
<xsl:value-of
select="$href-start"
disable-output-escaping="yes"/>[@<xsl:value-of
select="following-sibling::url"/>]<xsl:value-of select="$href-mid"
disable-output-escaping="yes"/>
<xsl:value-of
select="substring-after(., '‘')"/>
</xsl:...>
<xsl:... test="contains(., '’')">
<xsl:value-of select="substring-before(., '’')"/>
<xsl:value-of select="$href-finish"
disable-output-escaping="yes"/>
<xsl:value-of
select="substring-after(., '’')"/>
</xsl:..>
<xsl:apply-templates .... mode="url"/>
</xsl:copy>
</xsl:template>
-->
</xsl:stylesheet>
2) Additionally,
when dealing with
such mixed content (I mean containing both text and child
elements),
what is the best way to split and handle elements and text
seperately?
Thanks and look forward to suggestions,
Karl
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] required parameter, Michael Kay | Thread | Re: [xsl] XML to XML change, handli, Ruud Grosmann |
| Re: [xsl] required parameter, Michael Kay | Date | Re: [xsl] XML to XML change, handli, Ruud Grosmann |
| Month |