Re: [xsl] Processing mixed content. [Was: Parsing complex line (mixed text and markup)]

Subject: Re: [xsl] Processing mixed content. [Was: Parsing complex line (mixed text and markup)]
From: "Manfred Staudinger" <manfred.staudinger@xxxxxxxxx>
Date: Sun, 17 Feb 2008 14:38:09 +0100
On 16/02/2008, Ilya Lifshits <chehlo@xxxxxxxxx> wrote:
> I wonder if the Michael first suggestion has disadvantages for your opinion and
> you are trying to improve, or this is just another possible solution ?
I would think, this solution is more general, but I had hoped to get
Michael to comment on that. Certainly it's easy to implement in XSLT
1.0.
Anyway here is a _corrected_ version of the above, tested with saxon 9.0

<xsl:template match="tbentry">
	<xsl:copy>
		<xsl:apply-templates select="@*"/>
		<xsl:variable name="curr" select="."/>
		<xsl:variable name="temp">
			<xsl:apply-templates select="node()" mode="text"/>
		</xsl:variable>
		<xsl:for-each select="tokenize($temp, ',')">
			<entry>
				<xsl:for-each select="tokenize($temp, '@xy')">
					<xsl:choose>
						<xsl:when test="starts-with(., 'xy')">
							<xsl:apply-templates
select="$curr/node()[xs:integer(substring(current(), 3))]"/>
						</xsl:when>
						<xsl:otherwise>
							<xsl:value-of select="."/>
						</xsl:otherwise>
					</xsl:choose>
				</xsl:for-each>
			</entry>
		</xsl:for-each>
	</xsl:copy>
</xsl:template>
<xsl:template match="*" mode="text">
	<xsl:value-of select="concat('@xyxy', position(), '@xy')"/>
</xsl:template>

Manfred

On 16/02/2008, Ilya Lifshits <chehlo@xxxxxxxxx> wrote:
> While I'm absolutely not capable to comment if this solution is valid,
> since i'm completely newbie . I wander if the Michael first suggestion
> has disadvantages for your opinion and you are trying to improve, or
> this is just another possible solution ?
> From my newbie point of view the Michael suggestion is more straight
> forward and clear.
>
> Ilya.
>
>
> On Feb 15, 2008 10:43 PM, Manfred Staudinger
> <manfred.staudinger@xxxxxxxxx> wrote:
> > Hi All,
> >
> > I would like to propose a third variant and to get your comments about it.
> >
> > On 15/02/2008, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> > > On 14/02/2008, Ilya Lifshits <chehlo@xxxxxxxxx> wrote:
> > > > I'm using xslt 2.0 processor both saxon and and altova.
> > > >
> > > > I'm trying to parse complex line like:
> > > > <tbentry>Some text, Some more text <xref linkend="somelink">
> > > > even more text , , ,</tbentrys>
> > > >
> > > > and get following output :
> > > >
> > > > <row>
> > > > <entry>Some text</entry>
> > > > <entry>Some more text <xref
> > > > linkend="ut_man_related_docs"> and even more text </entry> </row>
> > > >
> > > > Number of entries is not constant.
> > > >
> > > > I have easily find the solution of this without mixing the
> > > > text and markup by using tokenize function.
> > > > But failed to separate text and markup using this approach.
> > > > Example can be found here : http://pastebin.com/m40fd204f
> > > >
> > > > To formalize the goal: I want to simplify life of our tech
> > > > writes by creating wrappers on top of DocBook that will
> > > > help transform from my defined syntax to standard Docbook code.
> > > > So if there is another more appropriate way (which is not WYSIWYG
> > > > editor) to achieve this, i can completely change the source line:
> > > > <tblrow>Some text, Some more text <xref linkend="somelink">
> > > > even more text </tblrow> as soon as it's still easy to write
> > >
> > > This problem has come up in the past and it's not particularly easy. There
> > > seem to be two main approaches:
> > >
> > > (a) convert the string delimiters into element markup, and then use grouping
> > > facilities (xsl:for-each-group) to analyze the overall structure
> > >
> > > (b) convert the markup into string delimiters, and then use
> > > xsl:analyze-string.
> > >
> > > Both work, but I think (a) is probably a bit easier.
> > >
> > > Do all the delimiters (commas) occur in top-level text nodes, or can they
> > > occur nested within elements? I'll assume the former.
> > >
> > > Start by making a copy of the data in which the commas are replaced by
> > > <comma/> elements:
> > >
> > > <xsl:template match="tbentry">
> > > <xsl:variable name="temp">
> > > <xsl:apply-templates mode="replace-commas"/>
> > > </xsl:variable>
> > > <xsl:for-each-group select="$temp/child::node()"
> > > group-starting-with="comma">
> > > <entry><xsl:copy-of select="current-group()[not(self::comma)]"/></entry>
> > > <xsl:for-each-group>
> > > </xsl:template>
> > >
> > > <xsl:template match="*" mode="replace-commas">
> > > <xsl:copy-of select="."/>
> > > </xsl:template>
> > >
> > > <xsl:template match="text()" mode="replace-commas">
> > > <xsl:analyze-string select="." regex=",">
> > > <xsl:matching-substring><comma/></xsl:matching-substring>
> > > <xsl:non-matching-substring><xsl:value-of
> > > select="."/></xsl:non-matching-substring>
> > > </xsl:analyze-string>
> > > </xsl:template>
> > >
> >
> > (c) convert the elements into strings which contain the position()
> > of the element. After processing the string, reinsert those elements.
> >
> > Let's assume the document does not contain 'xy'. Then
> > <xsl:template match="tbentry">
> > <xsl:variable name="temp">
> >    <xsl:apply-templates mode="text"/>
> > </xsl:variable>
> > <xsl:for-each select="tokenize($temp, ',')">
> >    <entry>
> >       <xsl:for-each select="tokenize(., '@xy')">
> >          <xsl:choose>
> >             <xsl:when test="starts-with(., 'xy')">
> > <!-- A -->   <xsl:apply-templates
> > select="/node()[xs:integer(substring(., 3))]"/>
> >             </xsl:when>
> >             <xsl:otherwise>
> >                <xsl:value-of select="."/>
> >             </xsl:otherwise>
> >          </xsl:choose>
> >       <xsl:for-each>
> >    </entry>
> > <xsl:for-each>
> > </xsl:template>
> >
> > <xsl:template match="*" mode="text">
> >         <xsl:value-of select="concat('@xyxy', position(), '@xy')"/>
> > </xsl:template>
> > <xsl:template match="text()" mode="text">
> >         <xsl:value-of select="."/>
> > </xsl:template>
> >
> > Not tested and I'm uncertain about (A), but a very similar solution
> > works fine in XSLT 1.0, where the processing of the string is done by
> > recursive templates.
> >
> > Thanks in advance,
> >
> > Manfred
> > http://documenta.rudolphina.org/Indices/Index.html

Current Thread