Re: [xsl] invalid xpath?

Subject: Re: [xsl] invalid xpath?
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Wed, 02 Jul 2008 16:42:28 +0200
Hi Trevor,

You do not show how you want the output of your samples, i.e., with or without preserved whitespace (the leading/trailing space). Like Michael Ludwig, I think it easiest both for you and for us to extract the (included) stylesheets that together do your whitespace handling and post that. It should work standalone and if it does, it will be easier to put back in.

You say "simple question". If it is so simple, why do you need to ask it? But as often, a question can be simple, yet the answer can be dauntingly complex... ;)

Whitespace handling is hard. Especially with XSLT 1.0, even more especially with closed source tools (like MS XSLT parser and XMLSpy) because these seem to handle whitespace differently regardless what you do by hand. So, complete control, more over, complete control cross-processor on whitespace is next to impossible.

If you must handle whitespace delicately, the trick is usually to transform (translate(...)) it into some unicode private use characters and leave them during the processing. Then, when you need to "format" your whitespace, you do not format your whitespace, but your private use characters, which are much easier to handle:

<xsl:template match="text()[starts-with(., '&#xE001')]">
  ....

and during testing, it is easier to see where it goes wrong. In addition, this gives more control on how to deal with processors that add/remove/normalize whitespace at will (ms xslt). And last but not least, it will be way easier to port to XSLT 2.0 once you want to go that path (just add a character-mapping section to your stylesheet and you're done).

Now, back to your situation. From your last post below I am under the impression that you do not wish to revisit your current implementation. I don't second your assessment on XSLT 2.0 doing it well with the code your provided, as the code could not possibly do what you want in all scenarios.

I think you need four things to make it both clearer and remove ambiguities:

1. Use a matching template for <nl> nodes. These are _always_ translated into a newline if I understand you correctly. Replace them with a unicode private use character and translate them back into a newline in the last pipeline.

2. Remove all special cases from your named template like I suggested earlier (namely the <nl> cases).

3. Keep the node a node as long as possible and use apply-templates or call-template with the node, do not use string functions until the node is only text and has not children anymore.

4. Avoid call-template where possible, leaving the logic to XSLT, not you, to find out.

And, though you haven't answered my question on XSLT 2.0, but still: this can be done without any call-template recursion in XSLT 2.0.

HTH,
Cheers,
-- Abel --

Trevor Nicholls wrote:
Hi Abel

I don't want to post the whole stylesheet here because it is rather long and
complex, and my only problem is with this particular template which is
making a small adjustment to text() node children of elements which have a
preformatted attribute - or to the output of another template
(pre)processing such a text node. So we are only dealing with text.

The input XML may be coming from a variety of sources which pad out the
input with whitespace in different ways. The template I posted is part of an
included stylesheet which provides templates to try and normalise some of
this input. The text nodes are identified as "solitary", "initial", "final",
and "central", depending on the presence or absence of sibling elements to
one or both sides, and then whitespace may be handled differently. I have
WSfromL, WSfromR, KeepWS, and various other templates which are pipelined
together as appropriate.

Moving to the particular issue I have, we're looking at code samples where
spaces are significant, and line breaks are inserted into the text with
<nl/> elements. The following three XML fragments need to produce identical
output:

<code>
   abc<nl/>
   def<nl/>
</code>

<code> abc<nl/> def</nl></code>

<code>   abc<nl/>
   def<nl/></code>


So the original piece of XSL I gave you is dropping an initial newline a) from a preformatted text node which has a <nl> element as its preceding sibling (lines 2 and 3 of the first example, line 1 of the third example); b) from a preformatted text node which has no preceding sibling and which commences with a newline (line 1 of the first example).

I do not want to translate *all* newlines to nothing, because newlines which
occur in the input because of line wrapping (and this happens) need to be
preserved as a real whitespace character (a later template in the chain
translates them to spaces).

A simple question is getting more and more complex! I still don't understand
why the original stylesheet (..)[..] thru Saxon drops the newlines as I
want, and the modified stylesheet (..) and (..) thru xsltproc/XMLSpy turns
them into space characters as I don't want.

Cheers
Trevor


-----Original Message-----
From: Abel Braaksma [mailto:abel.online@xxxxxxxxx] Sent: Thursday, 3 July 2008 12:54 a.m.
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [xsl] invalid xpath?


Trevor Nicholls wrote:
Thank you Abel

[...snip...]

<xsl:when
test="not(preceding-sibling::*)[starts-with($Arg,'&#x0a;')]">
<xsl:call-template name="WS">

[...snip...]
OK, the foregoing is invalid 1.0. So I tried modifying it to this:

<xsl:when test="not(preceding-sibling::*) and
starts-with($Arg,'&#x0a;')">
<xsl:call-template name="WS">

Now there are no reported errors, but the test appears not to be working
(at
least, there is an extra leading space in the output document wherever
this
template has been called, compared with what Saxon was producing with the
original test).

In all honesty, I haven't delved into your stylesheet logic. What you are testing above is whether the current node has a preceding sibling element and whether $Arg starts with a newline character.


You don't show how the original template is called. You select the current node into $Arg (which could contain any number of children) and then you use string functions on that node, which essentially normalizes that node into a string, giving you no way whatsoever to extract any elements from it (they will all be stringized).

Is that what you want? Is that expected behavior?

If you want to remove the newlines you could make it easier on yourself by using:

translate($Arg, '&#xA;', '')


You also seem to have special cases. Why not use the template techniques for those cases? Let XSLT decide for you:


<xsl:template match="text()[following-sibling::nl]">
    ....

<xsl:template match="text()">
   ....

Then your KeepWS and WS named templates will become easier to program.

HTH,
Cheers,
-- Abel --

Current Thread