Re: [xsl] Preserving CDATA sections?

Subject: Re: [xsl] Preserving CDATA sections?
From: Dan Vint <dvint@xxxxxxxxx>
Date: Sat, 15 Dec 2012 09:06:40 -0800
Yeah I got to thinking last night that PIs might be something else to try. There is a node test for it so it should work like a comment without the restriction of nested comments. Working here meaning that I can write a template that will recognize them and preserve/delete them as I want and they also don't show up as content.


At 11:43 PM 12/14/2012, you wrote:
If you want to inject content that is visible within your processing pipeline, but which doesn't disrupt other applications, then processing instructions might do the job. Just don't try to nest them.

But elements (in an alien namespace) are probably a better solution, as far as I can see from your description.

Michael Kay

On 15/12/2012 00:52, Dan Vint wrote:
I've come across an interesting problem that I'd like some ideas to solve.

So I'm writing a series of XSLT stylesheets to convert s1000d issue 2.2 to issue 4. This content also has change markup that we want to strip/resolve as we move forward. So I'm running a conversion sequence like this:

1) Renumber/name the data modules and all links between them in the files
2) Resolve all change markup and strip.
3) Move from 2.2 to issue 3 and correct some tagging issues
4) Move from 3 to issue 4 changing the element and attribute names and some structures that were reworked

I recently discovered that the authors did some things with the change markup that I didn't expect (you say that never happens right ;-)) !. So to aid the debugging and verifying that the stylesheet is not introducing problems I have been trying to capture the "before" state of the content in comments and then output the resolved results. So content is marked for change with a specific <change> tag or many elements have the same attributes to let you mark change on the structural tags. So I can get some involved markup where entire sections might be flagged to be deleted or modified and I'm trying to copy that structure.

To copy the content I'm using <xsl:copy-of> which does the trick of getting the current condition. I tried wrapping this in a comment with:

<xsl:copy-of select="."/>

That only got a comment with the text content of the element I was copying - no markup. So I then tried this that worked:

<xsl:text disable-output-escaping="yes">
<xsl:copy-of select="."/>
<xsl:text disable-output-escaping="yes">

This got the markup and all the contained content. But oh those writers ;-), they also included their own comments in some of those elements. So when I tried processing these results in the next step I got errors about nesting comments inside comments.

So my next thought was to wrap this information inside a CDATA section with similar constructs:

<xsl:text disable-output-escaping="yes">
<xsl:copy-of select="."/>
<xsl:text disable-output-escaping="yes">

This worked great for this stage, but I found after the next processing step that those CDATA sections were unwrapped and the text/markup that was escaped was left in place.

I would like to be able to push these files through all the steps and have the intermediate results available for troubleshooting - that was my goal. It looks like I'm going to have to have the files reviewed/cleaned up by hand immediately following the removal of the change markup and then after the files are clean, then continue processing. That forces the team to have to touch the files twice.

As I'm writing this up, maybe what I need to do is to wrap CDATA sections inside CDATA sections. Such that there are enough wrappings to allow the last stage to still have one level of wrapping left. Not sure if it is legal, but it is an idea to work around the problem. The writers/conversion team will not see the multiple wrappings unless they (more likely me) have to go back to trace down a problem.

The only other way I can see how to do this is to not to use a fully aware XML processing method - but I have all these stylesheets completed except for this last minute problem. Another possibility would be to push the change markup removal to the last step. It would mean rewriting that stylesheet to deal with the new issue 4 elements, but that would also work.

Anyone have ideas for an alternate solution?
Danny Vint

Panoramic Photography

voice: 619-938-3610

--------------------------------------------------------------------------- Danny Vint

Panoramic Photography

voice: 619-938-3610

Current Thread