Subject: [xsl] Processing mixed content. [Was: Parsing complex line (mixed text and markup)] From: "Manfred Staudinger" <manfred.staudinger@xxxxxxxxx> Date: Fri, 15 Feb 2008 21:43:30 +0100 |
Hi All, I would like to propose a third variant and to get your comments about it. On 15/02/2008, Michael Kay <mike@xxxxxxxxxxxx> wrote: > On 14/02/2008, Ilya Lifshits <chehlo@xxxxxxxxx> wrote: > > I'm using xslt 2.0 processor both saxon and and altova. > > > > I'm trying to parse complex line like: > > <tbentry>Some text, Some more text <xref linkend="somelink"> > > even more text , , ,</tbentrys> > > > > and get following output : > > > > <row> > > <entry>Some text</entry> > > <entry>Some more text <xref > > linkend="ut_man_related_docs"> and even more text </entry> </row> > > > > Number of entries is not constant. > > > > I have easily find the solution of this without mixing the > > text and markup by using tokenize function. > > But failed to separate text and markup using this approach. > > Example can be found here : http://pastebin.com/m40fd204f > > > > To formalize the goal: I want to simplify life of our tech > > writes by creating wrappers on top of DocBook that will > > help transform from my defined syntax to standard Docbook code. > > So if there is another more appropriate way (which is not WYSIWYG > > editor) to achieve this, i can completely change the source line: > > <tblrow>Some text, Some more text <xref linkend="somelink"> > > even more text </tblrow> as soon as it's still easy to write > > This problem has come up in the past and it's not particularly easy. There > seem to be two main approaches: > > (a) convert the string delimiters into element markup, and then use grouping > facilities (xsl:for-each-group) to analyze the overall structure > > (b) convert the markup into string delimiters, and then use > xsl:analyze-string. > > Both work, but I think (a) is probably a bit easier. > > Do all the delimiters (commas) occur in top-level text nodes, or can they > occur nested within elements? I'll assume the former. > > Start by making a copy of the data in which the commas are replaced by > <comma/> elements: > > <xsl:template match="tbentry"> > <xsl:variable name="temp"> > <xsl:apply-templates mode="replace-commas"/> > </xsl:variable> > <xsl:for-each-group select="$temp/child::node()" > group-starting-with="comma"> > <entry><xsl:copy-of select="current-group()[not(self::comma)]"/></entry> > <xsl:for-each-group> > </xsl:template> > > <xsl:template match="*" mode="replace-commas"> > <xsl:copy-of select="."/> > </xsl:template> > > <xsl:template match="text()" mode="replace-commas"> > <xsl:analyze-string select="." regex=","> > <xsl:matching-substring><comma/></xsl:matching-substring> > <xsl:non-matching-substring><xsl:value-of > select="."/></xsl:non-matching-substring> > </xsl:analyze-string> > </xsl:template> > (c) convert the elements into strings which contain the position() of the element. After processing the string, reinsert those elements. Let's assume the document does not contain 'xy'. Then <xsl:template match="tbentry"> <xsl:variable name="temp"> <xsl:apply-templates mode="text"/> </xsl:variable> <xsl:for-each select="tokenize($temp, ',')"> <entry> <xsl:for-each select="tokenize(., '@xy')"> <xsl:choose> <xsl:when test="starts-with(., 'xy')"> <!-- A --> <xsl:apply-templates select="/node()[xs:integer(substring(., 3))]"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="."/> </xsl:otherwise> </xsl:choose> <xsl:for-each> </entry> <xsl:for-each> </xsl:template> <xsl:template match="*" mode="text"> <xsl:value-of select="concat('@xyxy', position(), '@xy')"/> </xsl:template> <xsl:template match="text()" mode="text"> <xsl:value-of select="."/> </xsl:template> Not tested and I'm uncertain about (A), but a very similar solution works fine in XSLT 1.0, where the processing of the string is done by recursive templates. Thanks in advance, Manfred http://documenta.rudolphina.org/Indices/Index.html
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Difference in priority of, Mukul Gandhi | Thread | Re: [xsl] Processing mixed content., Ilya Lifshits |
RE: [xsl] Difference in priority of, Michael Kay | Date | [xsl] ten, David Carlisle |
Month |