[xsl] XSLT 4: normalize-mixed()

Subject: [xsl] XSLT 4: normalize-mixed()
From: "Graydon graydon@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 25 May 2020 03:51:33 -0000
So fairly often I get to try to do things to documentation XML documents; lots of mixed content, and for various reasons someone wants the content regularized.  The pretty-print indentation should all be stripped out and eventually added back in via some consistent means.  (Or the pretty-print indentation is making finding indexed phrases more challenging, or...)

Going through and applying normalize-space() to all the text nodes is an obvious bad idea; it loses the spaces before mixed content elements.

<p>These are some <i>words</i></p>

turns into

<p>These are some<i>words</i></p>

and that's no help to anyone.

In a related way, normalize-space() has a narrow definition of white space -- U+000A, U+000D, U+009, and U+0020  (linefeed, carriage return, tab and space) -- and this is not always entirely helpful.  The content may have non-breaking spaces, ideographic spaces, or other fancy spaces in it.

Would it be possible to get a normalize-mixed() that takes a sequence of text and element nodes and a sequence of characters, returning a sequence of text and elements nodes where any number of the characters in the sequence of replace characters have been replaced with single spaces and the trailing or leading spaces on the text nodes haven't been deleted?

I realize that there's no reason not to write this as a user-defined function; it's how often I wind up wanting it that makes me think it might be something to consider as a language function.

-- 
Graydon Saunders  | graydonish@xxxxxxxxx
^fs oferiode, pisses swa mfg.
-- Deor  ("That passed, so may this.")

Current Thread