Subject: Re: [xsl] Removing unwanted space From: "Peter Flynn peter@xxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Fri, 4 Jun 2021 21:41:12 -0000 |
On 04/06/2021 00:54, Charles O'Connor coconnor@xxxxxxxxxxxx wrote: > OK, I've tried this a bunch of ways and failed (using XSLT 2.0). > > The XML I'm working with has a bunch of unwanted whitespace in all sorts of places, but looking just at paragraphs, it can have > > <p> > The rain in <bold>Spain</bold> <italic>is</italic> wet. > </p> This illustrates a recurrent and persistent problem in getting the logic of dealing with white-space adjusted for the circumstances. There is no built-in ltrim() or rtrim() function for removing white-space from the start or end of character data in mixed content, and there is no "interior" version of normalize-space() which leaves the start and end untouched, but collapses white-space internally. All can very simply be written, of course. The xsl:strip-space setting can be used to strip white-space nodes between the start of mixed content and a child element, but I believe it does not remove white-space at the start of mixed content where the first non-white-space token is character data content. In the absence of a schema or DTD to dictate where mixed content is used, the indent="yes" attribute on the xsl:output element may indent subelements in mixed content. My own rules for dealing with this are something like: Pass all text nodes in mixed content through a template which will strip space from the start (if it's the first text node in an element) or the end (if it's the last text node in an element) or both (if it occurs somewhere else in the element. Test each subelement in mixed content for the immediate adjacency of another element node BEFORE it, and output a single space to put back the one omitted by the parser. For example, given <doc> <p> The rain in <bold>Spain</bold> <italic>is</italic> wet. </p> <p> <bold>The rain in Spain is wet.</bold> </p> <p> <anchor> </anchor> The rain in <bold> <underline> Spain </underline> </bold> <italic> is </italic> wet. </p> </doc> with <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0"> <xsl:output method="xml"/> <xsl:strip-space elements="*"/> <xsl:template match="doc | p"> <xsl:element name="{name()}"> <xsl:apply-templates select="node()"/> </xsl:element> </xsl:template> <xsl:template match="bold | italic | underline | anchor"> <xsl:call-template name="compensate-space"/> <xsl:element name="{name()}"> <xsl:apply-templates select="node()"/> </xsl:element> </xsl:template> <xsl:template match="text()"> <xsl:choose> <xsl:when test="not(preceding-sibling::text()) and not(following-sibling::text())"> <xsl:value-of select="replace(replace(.,'^[\s][\s]*',''),'[\s][\s]*$','')"/> </xsl:when> <xsl:when test="not(preceding-sibling::text())"> <xsl:value-of select="replace(.,'^[\s][\s]*','')"/> </xsl:when> <xsl:when test="not(following-sibling::text())"> <xsl:value-of select="replace(.,'[\s][\s]*$','')"/> </xsl:when> </xsl:choose> </xsl:template> <xsl:template name="compensate-space"> <xsl:if test="preceding-sibling::node() and preceding-sibling::* and count(preceding-sibling::node()[1] | preceding-sibling::*[1])=1"> <xsl:text> </xsl:text> </xsl:if> </xsl:template> </xsl:stylesheet> we get <?xml version="1.0" encoding="UTF-8"?><doc><p>The rain in <bold>Spain</bold> <italic>is</italic> wet.</p><p><bold>The rain in Spain is wet.</bold></p><p><anchor/>The rain in <bold><underline>Spain</underline></bold> <italic>is</italic> wet.</p></doc> This does not address the conversion of the anchor element to NET format, nor the indentation of p elements which would be conventional. Peter
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Removing unwanted space, Charles O'Connor coc | Thread | Re: [xsl] Removing unwanted space, Liam R. E. Quin liam |
Re: [xsl] Removing unwanted space, Liam R. E. Quin liam | Date | Re: [xsl] Removing unwanted space, Liam R. E. Quin liam |
Month |