Subject: Re: [xsl] Correcting misplaced spaces in XML documents From: "Bauman, Syd s.bauman@xxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Sun, 26 Mar 2023 02:07:50 -0000 |
Would it be reasonable to stop worrying about element names, and just do this for all text? My first crack is as follows. Not thoroughly tested. <!-- By default, normalize space (thus trimming off leading & trailing spaces) --> <xsl:template match="text()" priority="1"> <xsl:sequence select="normalize-space(.)"/> </xsl:template> <!-- But IF the node following the text is an element whose 1st text child starts with space, append a space --> <xsl:template match="text()[ following-sibling::node()[1][self::*[ child::text()[1][ substring( ., 1, 1 ) eq ' '] ] ] ]" priority="2.1"> <xsl:sequence select=".||' '"/> </xsl:template> <!-- And IF the node preceding the text is an elmeent whose last text child ends with a space, add a leading space --> <xsl:template match="text()[ preceding-sibling::node()[1][self::*[ child::text()[ last() ][ substring( ., string-length(.), 1 ) eq ' ' ] ] ] ]" priority="2.0"> <xsl:sequence select="' '||."/> </xsl:template> For example, I have not tested what happens when there is a text node that meets both criteria. (I suspect the append space clause just wins, because it has higher priority. But I have not tested.) BTW, I am not sure, but I think if you replace <xsl:sequence> with <xsl:value-of> and use concat() instead of ||, this is an XSLT 1.0 solution. (If it works.) ________________________________ I suppose this falls into the category of data cleanup. In the very simple case I am importing documents which have content like this: <para>Press the<keyname> Escape </keyname>key.</para> You'll notice that the adjacent spaces are wrapped in the keyname element when they should just be adjacent to it, not in it. This is a pathological case, usually the keyname is correct, but occasionally there is a leading or a trailing space, hardly ever both. I've written a simple stylesheet which corrects this situation, identifying leading and trailing whitespace, and outputting the appropriate breakdown: <xsl:template match="keyname"> <xsl:variable name="leading"></xsl:variable> <xsl:variable name="trailing"></xsl:variable> <xsl:variable name="content"></xsl:variable> <xsl:if test="$leading" != ''><xsl:value-of select="$leading"/></xsl:if> <xsl:element name="keyname"> <xsl:apply-templates select="@*"/> <xsl:value-of select="$content" /> </xsl:element> <xsl:if test="$trailing" != ''><xsl:value-of select="$trailing"/></xsl:if> </xsl:template> This is all fine, and it's adequate for the job when the "greedy" elements only contain text, which is the case for keynames. However now I want to extend the stylesheet to correct some other cases where the content model of the element is not just simple text. For example: <para>Select the<filename> <var>username</var>.profile </filename>file.</para> Although the cases I am looking at right now only have a content model of text or <var> elements, a more general solution would be welcome because other cases are going to turn up where elements are nested two or three levels deep. I've got myself neck deep into conditionals trying to extend my simple template to cope with this, and I'm sure there's a straightforward way of doing it that doesn't need several hundred lines of code. Can anyone point me to a cleaner way of doing it?
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Correcting misplaced spac, Graydon graydon@xxxx | Thread | Re: [xsl] Correcting misplaced spac, Peter Flynn peter@xx |
Re: [xsl] Correcting misplaced spac, Graydon graydon@xxxx | Date | Re: [xsl] Correcting misplaced spac, Peter Flynn peter@xx |
Month |