Re: [xsl] best practices for preserving spaces in mixed content when making XML to XML

Subject: Re: [xsl] best practices for preserving spaces in mixed content when making XML to XML
From: "G. Ken Holman" <gkholman@xxxxxxxxxxxxxxxxxxxx>
Date: Fri, 06 Sep 2013 12:44:39 -0400
At 2013-09-06 12:32 -0400, Dorothy Hoskins wrote:
HI, I would like advice on the following situation:
I am copying the majority of one XML to another XML but I want to
insert some label text into some elements.
Beginning XML:
<note type="warning"><para>This is <b>mixed content</b> text for a
specific purpose.para></note>

Final XML could be this:
<note type="warning"><para><label>label text: </label>This is <b>mixed
content</b> text for a specific purpose.</para></note>

or it could be this:
<note type="warning"><label>label text: </label><para>This is <b>mixed
content</b> text for a specific purpose.</para></note>
The <label> will have distinctive output formatting.

My questions:
1) Which output is generally preferable, when <para> is a block
element - <label> before <para> or <label> inside <para>? Since <para>
is a block element, it seems that it would be better to put the
<label> inside the <para> rather than before it, within the <note>.

I prefer the latter. The label is a property of the note, it isn't a property of the paragraph (unless you have other labeled paragraphs in your environment).

2) Assuming that I might want to make changes within the <para> also,
what is the best method to preserve the spaces around inline elements
such as <b>? For example, if there is an inline <image> in the <para>,
I might want to change its @href>, but probably I would simply copy
the <b>. If there aren't any inlines, I could just copy the <para>
without further processing, but I've had a problem retaining the
spaces around inlines when using apply-templates.

You'd have to figure out why you are losing spaces ... in your examples, none of the spaces should be lost. They are all part of a non-white-space-only text node and so all white-space within that node should be retained.

Where you will have problem is when <bold>something</bold> <italic>else</italic> creates a white-space-only text node between the two elements. IE loses the space between the two marked-up words, though the DLL is configurable when not using IE. Every other processor I know of preserves that text node between the two marked-up elements.

But for what you have asked, I would never expect the spaces to be lost.

3) Is it better practice to insert the punctuation (colon and space)
after generated label text as part of the <label> content, or to
output them as text literals after the <label>? For example, I can
generate the text of the label inside the <label> element, followed by
<xsl:text>: </xsl:text> or use some other method such as a variable to
externalize the following punctuation and pull that variable value in
with logic, so that the punctuation could be different for different

The latter, for precisely the reason you cite. If the colon is part of the formatting, let the stylesheet do it and don't put it into the data such that you might have to remove it awkwardly later.

I hope this helps.

. . . . . . Ken

Public XSLT, XSL-FO, and UBL classes in the Netherlands     Oct 2013 |
Public XSLT, XSL-FO, UBL and code list classes in Australia Oct 2013 |
Contact us for world-wide XML consulting and instructor-led training |
Free 5-hour lecture: |
Crane Softwrights Ltd.   |
G. Ken Holman                   mailto:gkholman@xxxxxxxxxxxxxxxxxxxx |
Google+ profile: |
Legal business disclaimers: |

Current Thread