Re: [xsl] problem with transforming mixed content

Subject: Re: [xsl] problem with transforming mixed content
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 15 Aug 2020 11:01:19 -0000
Like Graydon's solution, this solution falls into category (b): convert the
markup to text, then process as text. And like Graydon's solution, it makes
assumptions about the markup and text content that can be encountered in the
mixed content: in this case, the only markup it handles is what appears in the
supplied test case, that is, an <i> element with no attributes, and it assumes
that the '##' sequence won't appear naturally. The problem with this kind of
solution is that when you process 10,000 input documents it will do the right
thing for 9,999 of them, and you need very good testing to catch the failures.
In fact, you'll only catch the failure if you put a lot more effort into the
testing than you put into the actual code.

(I'm working this morning on a bug I've created in the course of Saxon
development that causes just 2 tests out of 30,000 in the QT3 test suite to
fail. Or there might be two bugs, of course. Indeed, more worryingly, there
might be three, and the tests are only catching two of them. As I'm sure
you've found in your work on Xerces, you can have a vast test suite and bugs
can still slip through. The general assumption with question-and-answer forums
seems to be that one test case is enough, and that's blatantly wrong.)

Mukul wrote:

>
> I've come up with following XSLT transform, which seems to work for this use
case,
>
> <xsl:stylesheet version="3.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform
<http://www.w3.org/1999/XSL/Transform>"
>                          xmlns:xs="http://www.w3.org/2001/XMLSchema
<http://www.w3.org/2001/XMLSchema>"
>                          exclude-result-prefixes="xs">
>
>    <xsl:output method="xml" indent="yes"/>
>
>    <xsl:template match="title">
>       <result>
>          <xsl:variable name="result_pass1" as="xs:string*">
>             <xsl:apply-templates select="node()" mode="pass1"/>
>          </xsl:variable>
>          <title>
>             <xsl:for-each
select="tokenize(normalize-space(substring-before(string-join($result_pass1,
''), ':')), '##')">
>                <xsl:call-template name="process_tokenize_result_item">
> 	          <xsl:with-param name="inpStr" select="."/>
>                </xsl:call-template>
>             </xsl:for-each>
>          </title>
>          <subtitle>
>             <xsl:for-each
select="tokenize(normalize-space(substring-after(string-join($result_pass1,
''), ':')), '##')">
>                <xsl:call-template name="process_tokenize_result_item">
>                   <xsl:with-param name="inpStr" select="."/>
>                </xsl:call-template>
>             </xsl:for-each>
>          </subtitle>
>       </result>
>    </xsl:template>
>
>    <xsl:template name="process_tokenize_result_item">
>       <xsl:param name="inpStr" as="xs:string"/>
>
>       <xsl:choose>
>       	 <xsl:when test="position() mod 2 = 0">
>       	   <i>
>       	     <xsl:value-of select="."/>
>       	   </i>
>       	 </xsl:when>
>       	 <xsl:otherwise>
>       	   <xsl:value-of select="."/>
>       	 </xsl:otherwise>
>       </xsl:choose>
>    </xsl:template>
>
>    <xsl:template match="node()" mode="pass1">
>        <xsl:choose>
>           <xsl:when test="self::i">
>              <xsl:value-of select="concat('##', lower-case(.), '##')"/>
>           </xsl:when>
>           <xsl:otherwise>
>             <xsl:value-of select="lower-case(.)"/>
>           </xsl:otherwise>
>        </xsl:choose>
>    </xsl:template>
>
> </xsl:stylesheet>
>
> The above XSLT transform, when provided following XML input document,
>
> <title>THE TITLE OF THE BOOK WITH SOME <i>ITALICS</i> AND SOME MORE
> WORDS: THE SUBTITLE OF THE BOOK WITH SOME <i>ITALICS</i></title>
>
> produces following result,
>
> <result>
>    <title>the title of the book with some <i>italics</i> and some more
words</title>
>    <subtitle>the subtitle of the book with some <i>italics</i>
>    </subtitle>
> </result>
>
> This solution, follows a two pass approach. In the first pass, the element
constructs <i>text</i> are transformed into ##text##  (assuming that delimiter
## doesn't interfere with the input text). The result of pass one, is
transformed into the final result by second pass.
>
>
>
> --
> Regards,
> Mukul Gandhi
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/293509> (by
email <>)

Current Thread