Re: [xsl] Preserving CDATA sections?

Subject: Re: [xsl] Preserving CDATA sections?
From: "G. Ken Holman" <gkholman@xxxxxxxxxxxxxxxxxxxx>
Date: Fri, 14 Dec 2012 20:31:29 -0500
At 2012-12-14 16:52 -0800, Dan Vint wrote:
I've come across an interesting problem that I'd like some ideas to solve.

So I'm writing a series of XSLT stylesheets to convert s1000d issue 2.2 to issue 4. This content also has change markup that we want to strip/resolve as we move forward.
So I then tried this that worked:

<xsl:text disable-output-escaping="yes">
<xsl:copy-of select="."/>
<xsl:text disable-output-escaping="yes">

I worry that as soon as you start digging the "disable-output-escaping" hole, you will only dig yourself deeper and deeper creating problem after problem and missing nuance after nuance. You will end up spending a lot of time creating a complex encoding scheme that may or may not work with future data sets (you did say that your authors were very creative).

For example, you cannot embed one CDATA section into another CDATA section, rather, you have to embed one CDATA section into two CDATA sections (or one plus some trailing markup). But then what you've done is you've taken all the old markup and made it text! It will end up showing up as output text after processing with stylesheets. And how will you find what you added when it comes time to remove it all.

And if you did use a comment, how would you know which comments to remove from the end result while preserving the original comments?

And, anyway, the use-case for disable-output-escaping= is mark the text in your output tree not to be escaped during serialization ... it isn't meant to be used as a way to synthesize serialization markup in the body of your end result. I have used it to synthesize prologue information in the output, but that is self-contained and doesn't get impacted by imaginative authored input. And any time in the future when you optimize your pipeline by working on intermediate trees instead of serialized Unicode files of markup and content, the intermediate trees will not reflect the node structure of what you need. The disable-output-escaping= is used when serializing to an output entity, not when building a tree for subsequent processing.

I always try to remind my students that XSLT is a node processing language not an angle-bracket processing language. As soon as you think you need to work with angle brackets, think again because you probably don't (or shouldn't!).

Anyone have ideas for an alternate solution?

Run your pipeline putting the old content you want to preserve into a custom element in a custom namespace. Your new content then has both the old content and the new content for you to visually do your comparison at the end of your pipeline to see what has changed.

At those points in your pipeline where you need to use an S1000D document model to validate an intermediate file, preprocess that file to strip out your custom element so that it doesn't trigger any problems.

And that stripping stylesheet will be handy when you are all done in order to remove the old content from the new file so that you only have the new file.

And the stripping stylesheet is small: only two template rules. One template rule catches all elements in your custom namespace and does nothing with them. The other template rule is the idiomatic identity template. Easy to write. Easy to use. Any time you need an intermediate file to produce a final output of some kind, just pre-process it and use your existing processes.

This is a scheme that doesn't use disable-output-escaping= and will work whether you serialize your intermediate files to output entities or pass intermediate trees from process to process. You don't have to worry about writing your own XML serialization logic (in XSLT of all languages!) and it will work regardless of what imaginative markup comes from your authors.

I hope this helps.

. . . . . . . . . . . Ken

Contact us for world-wide XML consulting and instructor-led training
Free 5-hour lecture:
Crane Softwrights Ltd.  
G. Ken Holman                   mailto:gkholman@xxxxxxxxxxxxxxxxxxxx
Google+ profile:
Legal business disclaimers:

Current Thread