[xsl] Processing mixed content. [Was: Parsing complex line (mixed text and markup)]

Subject: [xsl] Processing mixed content. [Was: Parsing complex line (mixed text and markup)]
From: "Manfred Staudinger" <manfred.staudinger@xxxxxxxxx>
Date: Fri, 15 Feb 2008 21:43:30 +0100
Hi All,

I would like to propose a third variant and to get your comments about it.

On 15/02/2008, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> On 14/02/2008, Ilya Lifshits <chehlo@xxxxxxxxx> wrote:
> > I'm using xslt 2.0 processor both saxon and and altova.
> >
> > I'm trying to parse complex line like:
> > <tbentry>Some text, Some more text <xref linkend="somelink">
> > even more text , , ,</tbentrys>
> >
> > and get following output :
> >
> > <row>
> > <entry>Some text</entry>
> > <entry>Some more text <xref
> > linkend="ut_man_related_docs"> and even more text </entry> </row>
> >
> > Number of entries is not constant.
> >
> > I have easily find the solution of this without mixing the
> > text and markup by using tokenize function.
> > But failed to separate text and markup using this approach.
> > Example can be found here : http://pastebin.com/m40fd204f
> >
> > To formalize the goal: I want to simplify life of our tech
> > writes by creating wrappers on top of DocBook that will
> > help transform from my defined syntax to standard Docbook code.
> > So if there is another more appropriate way (which is not WYSIWYG
> > editor) to achieve this, i can completely change the source line:
> > <tblrow>Some text, Some more text <xref linkend="somelink">
> > even more text </tblrow> as soon as it's still easy to write
>
> This problem has come up in the past and it's not particularly easy. There
> seem to be two main approaches:
>
> (a) convert the string delimiters into element markup, and then use grouping
> facilities (xsl:for-each-group) to analyze the overall structure
>
> (b) convert the markup into string delimiters, and then use
> xsl:analyze-string.
>
> Both work, but I think (a) is probably a bit easier.
>
> Do all the delimiters (commas) occur in top-level text nodes, or can they
> occur nested within elements? I'll assume the former.
>
> Start by making a copy of the data in which the commas are replaced by
> <comma/> elements:
>
> <xsl:template match="tbentry">
> <xsl:variable name="temp">
> <xsl:apply-templates mode="replace-commas"/>
> </xsl:variable>
> <xsl:for-each-group select="$temp/child::node()"
> group-starting-with="comma">
> <entry><xsl:copy-of select="current-group()[not(self::comma)]"/></entry>
> <xsl:for-each-group>
> </xsl:template>
>
> <xsl:template match="*" mode="replace-commas">
> <xsl:copy-of select="."/>
> </xsl:template>
>
> <xsl:template match="text()" mode="replace-commas">
> <xsl:analyze-string select="." regex=",">
> <xsl:matching-substring><comma/></xsl:matching-substring>
> <xsl:non-matching-substring><xsl:value-of
> select="."/></xsl:non-matching-substring>
> </xsl:analyze-string>
> </xsl:template>
>

(c) convert the elements into strings which contain the position()
of the element. After processing the string, reinsert those elements.

Let's assume the document does not contain 'xy'. Then
<xsl:template match="tbentry">
<xsl:variable name="temp">
   <xsl:apply-templates mode="text"/>
</xsl:variable>
<xsl:for-each select="tokenize($temp, ',')">
   <entry>
      <xsl:for-each select="tokenize(., '@xy')">
         <xsl:choose>
            <xsl:when test="starts-with(., 'xy')">
<!-- A -->   <xsl:apply-templates
select="/node()[xs:integer(substring(., 3))]"/>
            </xsl:when>
            <xsl:otherwise>
               <xsl:value-of select="."/>
            </xsl:otherwise>
         </xsl:choose>
      <xsl:for-each>
   </entry>
<xsl:for-each>
</xsl:template>

<xsl:template match="*" mode="text">
	<xsl:value-of select="concat('@xyxy', position(), '@xy')"/>
</xsl:template>
<xsl:template match="text()" mode="text">
	<xsl:value-of select="."/>
</xsl:template>

Not tested and I'm uncertain about (A), but a very similar solution
works fine in XSLT 1.0, where the processing of the string is done by
recursive templates.

Thanks in advance,

Manfred
http://documenta.rudolphina.org/Indices/Index.html

Current Thread