Re: [xsl] problem with transforming mixed content

Subject: Re: [xsl] problem with transforming mixed content
From: "Mukul Gandhi gandhi.mukul@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 15 Aug 2020 10:29:14 -0000
On Sat, Aug 15, 2020 at 7:46 AM Wolfhart Totschnig
wolfhart.totschnig@xxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
wrote:

> Dear list,
>
> I would like to ask for your help with the following mixed-content
> problem. I am receiving, from an external source, data in the following
> form:
>
> <title>THE TITLE OF THE BOOK WITH SOME <i>ITALICS</i> AND SOME MORE
> WORDS: THE SUBTITLE OF THE BOOK WITH SOME <i>ITALICS</i></title>
>
> What I would like to do is
> 1) separate the title from the subtitle (i.e., divide the data at the
> colon) and put each in a separate element node;
> 2) all the while maintaining the <i> markup;
> 3) and perform certain string manipulations on all of the text nodes;
> for the purposes of this post, I will use the example of converting
> upper-case to lower-case.
>
> So the desired output is the following:
>
> <title>the title of the book with some <i>italics</i> and some more
> words</title>
> <subtitle>the subtitle of the book with some <i>italics</i></subtitle>
>
> How can this be done?
>
> I know that I can perform string manipulations while maintaining the <i>
> markup with templates, i.e., <xsl:template match="text()"/> and
> <xsl:template match="i"/>. But in this case I do not know how to divide
> the data at the colon. And I know that I can divide the data at the
> colon with <xsl:value-of select="substring-before(.,': ')"/>, but then I
> loose the <i> markup. So I am at a loss.
>

I've come up with following XSLT transform, which seems to work for this
use case,

<xsl:stylesheet version="3.0" xmlns:xsl="
http://www.w3.org/1999/XSL/Transform";
                         xmlns:xs="http://www.w3.org/2001/XMLSchema";

                         exclude-result-prefixes="xs">

   <xsl:output method="xml" indent="yes"/>

   <xsl:template match="title">
      <result>
         <xsl:variable name="result_pass1" as="xs:string*">
            <xsl:apply-templates select="node()" mode="pass1"/>
         </xsl:variable>
         <title>
            <xsl:for-each
select="tokenize(normalize-space(substring-before(string-join($result_pass1,
''), ':')), '##')">
               <xsl:call-template name="process_tokenize_result_item">
          <xsl:with-param name="inpStr" select="."/>
               </xsl:call-template>
            </xsl:for-each>
         </title>
         <subtitle>
            <xsl:for-each
select="tokenize(normalize-space(substring-after(string-join($result_pass1,
''), ':')), '##')">
               <xsl:call-template name="process_tokenize_result_item">
                  <xsl:with-param name="inpStr" select="."/>
               </xsl:call-template>
            </xsl:for-each>
         </subtitle>
      </result>
   </xsl:template>

   <xsl:template name="process_tokenize_result_item">
      <xsl:param name="inpStr" as="xs:string"/>

      <xsl:choose>
      <xsl:when test="position() mod 2 = 0">
         <i>
           <xsl:value-of select="."/>
         </i>
      </xsl:when>
      <xsl:otherwise>
         <xsl:value-of select="."/>
      </xsl:otherwise>
      </xsl:choose>
   </xsl:template>

   <xsl:template match="node()" mode="pass1">
       <xsl:choose>
          <xsl:when test="self::i">
             <xsl:value-of select="concat('##', lower-case(.), '##')"/>
          </xsl:when>
          <xsl:otherwise>
            <xsl:value-of select="lower-case(.)"/>
          </xsl:otherwise>
       </xsl:choose>
   </xsl:template>

</xsl:stylesheet>

The above XSLT transform, when provided following XML input document,

<title>THE TITLE OF THE BOOK WITH SOME <i>ITALICS</i> AND SOME MORE
WORDS: THE SUBTITLE OF THE BOOK WITH SOME <i>ITALICS</i></title>

produces following result,

<result>
   <title>the title of the book with some <i>italics</i> and some more
words</title>
   <subtitle>the subtitle of the book with some <i>italics</i>
   </subtitle>
</result>

This solution, follows a two pass approach. In the first pass, the element
constructs <i>text</i> are transformed into ##text##  (assuming that
delimiter ## doesn't interfere with the input text). The result of
pass one, is transformed into the final result by second pass.



-- 
Regards,
Mukul Gandhi

Current Thread