Re: [xsl] Seeking a smarter tokenize for augmented text

Subject: Re: [xsl] Seeking a smarter tokenize for augmented text
From: "Martin Honnen martin.honnen@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 7 May 2021 10:11:53 -0000
Am 06.05.2021 um 11:09 schrieb Trevor Nicholls trevor@xxxxxxxxxxxxxxxxxx:
>
> I would like to extend this so that it can handle an element
> consisting of text lines with some embedded markup (there won't be
> much, but there will be some). For example, taking:
>
> <textlines>this is line <seq>1</seq>
>
> this is <var>line</var> 2
>
> this <emph>is</emph> line 3</textlines>
>
> and producing
>
> <textlines>
>
> <line>this is line <seq>1</seq></line>
>
> <line>this is <var>line</var> 2</line>
>
> <line>this <emph>is</emph> line 3</line>
>
> </textlines>
>

That part could be done with


<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
     xmlns:xs="http://www.w3.org/2001/XMLSchema";
     exclude-result-prefixes="#all"
     version="3.0">

   <xsl:mode on-no-match="shallow-copy"/>

   <xsl:mode name="insert-markers" on-no-match="shallow-copy"/>

   <xsl:template match="textlines">
       <xsl:copy>
           <xsl:variable name="transformed-textlines">
               <xsl:apply-templates mode="insert-markers"/>
           </xsl:variable>
           <xsl:for-each-group select="$transformed-textlines/node()"
group-ending-with="lb">
               <line>
                   <xsl:apply-templates select="current-group()"/>
               </line>
           </xsl:for-each-group>
       </xsl:copy>
   </xsl:template>

   <xsl:template mode="insert-markers" match="text()">
       <xsl:analyze-string select="." regex="\n+">
           <xsl:matching-substring>
               <lb/>
           </xsl:matching-substring>
           <xsl:non-matching-substring>
               <xsl:value-of select="."/>
           </xsl:non-matching-substring>
       </xsl:analyze-string>
   </xsl:template>

   <xsl:template match="lb"/>

</xsl:stylesheet>


https://xsltfiddle.liberty-development.net/6qaHaQK

Current Thread