Subject: Re: [xsl] Correcting misplaced spaces in XML documents From: "Chris Papademetrious christopher.papademetrious@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Sun, 26 Mar 2023 11:32:26 -0000 |
Hi Trevor, We have similar occurrences in our DITA source. And like you, I got lost in conditionals trying to implement a comprehensive solution. Sometimes the spaces are multiple element-levels deep: <p> This <A>is <B>some </B></A>text.</p> I don't have a solution for this yet, but I would like to solve it some day. One idea is to implement a recursive or iterative approach that bubbles the spaces upward, one level at a time. This template could be called on "scope elements" (credit goes to Gerrit for this term), beyond which spaces should not bubble, so that they would naturally stop when they reach the surface. And fortunately, XSLT "heals" multiple text nodes together (I think?), so subsequent recursions or iterations don't need to concatenate bubbled-up text nodes with their neighbors to understand what's going on. Hi Gerrit, Thanks for sharing your emphasis-normalize-space<https://github.com/gimsieke/emphasis-normalize-space> solution! Your code was clean with lots of comments, and I like how you parameterized everything. * Chris From: Trevor Nicholls trevor@xxxxxxxxxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Sent: Saturday, March 25, 2023 9:34 PM To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Subject: [xsl] Correcting misplaced spaces in XML documents I suppose this falls into the category of data cleanup. In the very simple case I am importing documents which have content like this: <para>Press the<keyname> Escape </keyname>key.</para> You'll notice that the adjacent spaces are wrapped in the keyname element when they should just be adjacent to it, not in it. This is a pathological case, usually the keyname is correct, but occasionally there is a leading or a trailing space, hardly ever both. I've written a simple stylesheet which corrects this situation, identifying leading and trailing whitespace, and outputting the appropriate breakdown: <xsl:template match="keyname"> <xsl:variable name="leading">...</xsl:variable> <xsl:variable name="trailing">...</xsl:variable> <xsl:variable name="content">...</xsl:variable> <xsl:if test="$leading" != ''><xsl:value-of select="$leading"/></xsl:if> <xsl:element name="keyname"> <xsl:apply-templates select="@*"/> <xsl:value-of select="$content" /> </xsl:element> <xsl:if test="$trailing" != ''><xsl:value-of select="$trailing"/></xsl:if> </xsl:template> This is all fine, and it's adequate for the job when the "greedy" elements only contain text, which is the case for keynames. However now I want to extend the stylesheet to correct some other cases where the content model of the element is not just simple text. For example: <para>Select the<filename> <var>username</var>.profile </filename>file.</para> Although the cases I am looking at right now only have a content model of text or <var> elements, a more general solution would be welcome because other cases are going to turn up where elements are nested two or three levels deep. I've got myself neck deep into conditionals trying to extend my simple template to cope with this, and I'm sure there's a straightforward way of doing it that doesn't need several hundred lines of code. Can anyone point me to a cleaner way of doing it? cheers T XSL-List info and archive<https://urldefense.com/v3/__http:/www.mulberrytech.com/xsl/xsl-list__ ;!!A4F2R9G_pg!c0EcPZOeHYB8RJzcPiE1qB9-MSTK7fIuP4yvN33-n7WP6qfYiXKfiANoS7FcpjT 3g2xN6kJ4093BydMFIudeYCbC8lLAI7TiyKPnSeuog9jhKQ3qkjj8$> EasyUnsubscribe<https://urldefense.com/v3/__http:/lists.mulberrytech.com/unsu b/xsl-list/3380743__;!!A4F2R9G_pg!c0EcPZOeHYB8RJzcPiE1qB9-MSTK7fIuP4yvN33-n7W P6qfYiXKfiANoS7FcpjT3g2xN6kJ4093BydMFIudeYCbC8lLAI7TiyKPnSeuog9jhKS4yB311$> (by email<>)
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Correcting misplaced spac, Trevor Nicholls trev | Thread | Re: [xsl] Correcting misplaced spac, Imsieke, Gerrit, le- |
Re: [xsl] Correcting misplaced spac, Imsieke, Gerrit, le- | Date | Re: [xsl] Correcting misplaced spac, Trevor Nicholls trev |
Month |