Subject: [xsl] Processing XML with multiple nested CDATA sections From: dvint@xxxxxxxxx Date: Thu, 28 Feb 2013 15:47:43 -0800 |
I have an XML file that is an export from a Wiki site. The management information for the wiki is in clear XML, bu tthe information contained in the pages (actual content) has been wrapped in CDATA sections. Some of these CDATA sections have CDATA sections in them. I need to extract the content and create individual files for each of the pages. So my first hurdle is unwrapping all these CDATA sections. I was handling the first one with a simple <xsl:result-document method="xml" href="{element/id}.html"> <html xmlns:ac="foo" xmlns:ri="bar"> <xsl:value-of disable-output-escaping="yes" select="normalize-space(key('objects', id)/property[@name='body'])"/> </html> </xsl:result-document> Is there some trick to deal with the nesting that I might try? So far it looks like I have about 3 levels to deal with. Content I'm processing looks like this: <hibernate-generic datetime="2012-12-30 17:00:12"> <object class="BodyContent" package="com.atlassian.confluence.core"> <id name="id">37749131</id> <property name="body"> <![CDATA[<p>Creating Inted.</p><p>You can also ptions.</p> <h1>Generating</h1><p><ac:link><ri:page ri:content-title="Types of Widgets" /><ac:plain-text-link-body><![CDATA[Infographic widgets]] > </ac:plain-text-link-body></ac:link> are ways.</p>]]> </property> <property name="content" class="Page" package="com.atlassian.confluence.pages"> <id name="id">37716459</id> </property> <property name="bodyType">2</property> </object> </hibernate-generic> This is my typical situation where there are little CDATA sections for the filenames, but I have seen other situations where large sections of content have been wrapped this way as well. I can brute force this and process my output file several times to finally cleanup all the CDATA sections, but I would like to be more elegant. Also I will have a need to de-reference these <id> elements within the original context, so even my current simple approach is going to cause problems. This current approach has extracted the content so I can take a look at it easily, but ultimately I really would like to submit that first CDATA section in the <property> element for additional processing. For instance those <ac:link> elements need to be converted to a different linking structure like the more typical <a href=""> form. ..dan
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] repeatedly calling a temp, Mark Wilson | Thread | Re: [xsl] Processing XML with multi, Christopher R. Maden |
Re: [xsl] repeatedly calling a temp, Mark Wilson | Date | Re: [xsl] Processing XML with multi, Christopher R. Maden |
Month |