Subject: Re: [xsl] How to split text element to separate spans? From: Israel Viente <israel.viente@xxxxxxxxx> Date: Tue, 8 Jun 2010 12:42:03 +0300 |
Liam and Gerrit: Thank you very much for your input,ideas and explanations. I have many things to catch up in XSLT in order to understand this code, but I'll try. Thanks again, Israel On Tue, Jun 8, 2010 at 2:28 AM, Imsieke, Gerrit, le-tex <gerrit.imsieke@xxxxxxxxx> wrote: > Dear Israel, > > I once wrote a generic splitting routine where you can split at arbitrary > XPath expressions, at any depth. It uses saxon:evaluate, though, and is too > complicated to be instructive here. So I tried to simplify it, below. > > Let's consider this input: > > =========8<------------------- > > <?xml version="1.0" encoding="utf-8"?> > <doc> > <p dir="ltr"><span class="smaller">text1 > B B B B B B <br /> > B B B B B B text2 > B B B B B B text3. > B B B B B B <br /> > B B B B B B </span> <span class="smalleritalic">no</span> <span > class="smaller">problems. > B B B B B B <br /> > > > B B B B B B <br /></span></p> > > <p dir="ltr"><br/><span class="smaller">text1 > B B B B B B <br /> > B B B B B B <span class="reallytiny">text2 <br /></span> > B B B B B B text3. > B B B B B B <br /> > B B B B B B </span> <span class="smalleritalic">no</span> <span > class="smaller">problems. > B B B B B B <br /> > > > B B B B B B <br /></span></p> > > <p dir="ltr"> B <span class="regular">"What else?"</span></p> > </doc> > > =========8<------------------- > > The first p contains your original input, the second p contains a br within > *nested* spans (and a br immediately below p), and the third one doesn't > contain a br. > > Applying the stylesheet quoted below, we'll arrive at this output: > > =========8<------------------- > > <?xml version="1.0" encoding="UTF-8"?><doc> > <p dir="ltr"><span class="smaller">text1 > B B B B B B </span><br/><span class="smaller"> > B B B B B B text2 > B B B B B B text3. > B B B B B B </span><br/><span class="smaller"> > B B B B B B </span> <span class="smalleritalic">no</span> <span > class="smaller">problems. > B B B B B B </span><br/><span class="smaller"> > > > B B B B B B </span><br/></p> > > <p dir="ltr"><br/><span class="smaller">text1 > B B B B B B </span><br/><span class="smaller"> > B B B B B B <span class="reallytiny">text2 </span></span><br/><span > class="smaller"> > B B B B B B text3. > B B B B B B </span><br/><span class="smaller"> > B B B B B B </span> <span class="smalleritalic">no</span> <span > class="smaller">problems. > B B B B B B </span><br/><span class="smaller"> > > > B B B B B B </span><br/></p> > > <p dir="ltr"> B <span class="regular">"What else?"</span></p> > </doc> > > =========8<------------------- > > You might find it dissatisfying that the XML code doesn't look as > pretty-printed as your desired output. In order to arrive at an output as > neat as specified, you will need to apply three more passes of whitespace > extraction/normalization (left, right, middle) to the top-level spans. If > you really have to pretty-print the XML in such a way, I will send you the > complete stylesheet. > > So here's the version that does just the splitting: > > =========8<------------------- > > <?xml version="1.0" encoding="utf-8"?> > <xsl:transform > B xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > B xmlns:my="my" > B version="2.0" > B exclude-result-prefixes="my"> > > B <xsl:output method="xml" indent="no" /> > > B <!-- Default identity transform: --> > B <xsl:template match="@* | *"> > B B <xsl:copy> > B B B <xsl:apply-templates select="@* | node()"/> > B B </xsl:copy> > B </xsl:template> > > B <xsl:template match="p/span"> > B B <xsl:sequence select="my:split-at-br(.)"/> > B </xsl:template> > > > B <!-- split-at-br is intended for > B B B B B <p>foo<br/>bar</p> > B B B -> <p>foo</p><br/><p>bar</p> --> > B <xsl:function name="my:split-at-br" as="element(*)+"> > B B <xsl:param name="top" as="element(*)" /> > B B <!-- group adjacent leaves (text nodes, empty elements) which are not br: > --> > B B <xsl:for-each-group > B B B select="$top//node()[ count(node()) = 0 ]" > B B B group-adjacent="not(self::br)"> > B B B <xsl:choose> > B B B B <xsl:when test="current-grouping-key()"> > B B B B B <!-- output the top element and its subtree, restricted to > B B B B B B B all ancestors of the current leaf group and the current leaf > group itself: --> > B B B B B <xsl:apply-templates select="$top" mode="split"> > B B B B B B <xsl:with-param name="restricted-to" select="current-group()" > tunnel="yes"/> > B B B B B </xsl:apply-templates> > B B B B </xsl:when> > B B B B <xsl:otherwise> > B B B B B <br/> > B B B B </xsl:otherwise> > B B B </xsl:choose> > B B </xsl:for-each-group> > B </xsl:function> > > B <xsl:template match="*" mode="split"> > B B <xsl:param name="restricted-to" as="node()*" tunnel="yes"/> > B B <!-- Only process this element if it's within the restriction group > B B B B or its members' ancestors: --> > B B <xsl:if test="generate-id(.) = ( > B B B B B B B B B B for $n in $restricted-to > B B B B B B B B B B return ( > B B B B B B B B B B B for $a in $n/ancestor-or-self::* > B B B B B B B B B B B return generate-id($a) > B B B B B B B B B B ) > B B B B B B B B B )"> > B B B <xsl:copy> > B B B B <xsl:copy-of select="@*"/> > B B B B <xsl:apply-templates mode="#current"> > B B B B B <xsl:with-param name="restricted-to" select="$restricted-to" > tunnel="yes"/> > B B B B </xsl:apply-templates> > B B B </xsl:copy> > B B </xsl:if> > B </xsl:template> > > B <xsl:template match="node()[count(node()) = 0]" mode="split"> > B B <xsl:param name="restricted-to" as="node()*" tunnel="yes"/> > B B <xsl:if test="generate-id(.) = (for $n in $restricted-to return > generate-id($n))"> > B B B <xsl:copy-of select="." /> > B B </xsl:if> > B </xsl:template> > > </xsl:transform> > > =========8<------------------- > > (Please note that I called it xsl:transform instead of xsl:stylesheet, as a > tribute to Roger L. Costello. But that's another thread, a dead thread.) > > The stylesheet resp. transformation program does the following: > > For each span immediately below a p, call a function that returns multiple > spans, interspersed with br's. > > This function works as follows: > > Of all descendants of the span, only select the leaves. So if the structure > is > p > B span(1) > B B span(2) > B B B text(a) > B B B br > B B B text(b) > B B span(3) > B B B text(c) > it selects the sequence (text(a), br, text(b), text(c)). > Then it groups the sequence according to the criterion that all non-br nodes > should be grouped (and all br nodes, too, as a consequence). > So we now have the following groups: > (text(a)) -- matches the grouping key > (br) -- doesn't match the grouping key > (text(b), text(c)) -- matches the grouping key > > For each of the non-br groups, span(1) -- the span to be split at br -- is > processed in mode="split", with the parameter $restricted-to set to the > current group. > > So firstly span(1) is being processed in mode="split" with $restricted-to = > (text(a)). > Only if span(1) is among the ancestors of $restricted-to (or among > $restricted-to itself) will its contents be processed. > Its contents will be processed in mode="split", with the same $restricted-to > parameter. > Being an ancestor of text(a), span(2) will be processed, while nothing > happens for span(3). > As a result of processing span(2) in mode="split", $restricted-to = > (text(a)), text(a) will be output. > > Going back to for-each-group: the next group is br which will be reproduced > as br, but on the same level as span(1). > > So far, our result tree looks like > p > B span(1) > B B span(2) > B B B text(a) > B br > > The next group is (text(b), text(c)). But again, span(1) will be processed > in mode="split", now $restricted-to = (text(b) text(c)). > As an ancestor to any of the $restricted-to leaf nodes, span(1) will be > reproduced (the element and its original attributes, not the entire > subtree!). > As ancestors to each of the leaf nodes, both span(2) and span(3) will be > reproduced below span(1). > When processing the subtree of span(2) with the restriction to (text(b), > text(c)), only text(b) will be output. For span(3), only text(c) will be > output. > So finally we have > p > B span(1) > B B span(2) > B B B text(a) > B br > B span(1) > B B span(2) > B B B text(b) > B B span(3) > B B B text(c) > > Although it may seem as overkill at first sight, the big advantage of this > approach is that it works well for br within nested spans. > > With the generic approach (arbitrary XPath expressions for splitting), you > can extend analyze-string to process markup: in a preparatory pass, use > plain analyze-string on the text nodes to replace the regex with some unique > markup, then use the generic splitting function to split at this markup, > then treat the resulting nodes as you would have treated matching or > non-matching substrings. > > -Gerrit > > > On 07.06.2010 13:36, Israel Viente wrote: >> >> Thank you for your answer Mukul. >> It does put the br between the spans but lose the spaces between spans >> and replace them with br. >> >> The result of the code you sent gives the following output: >> >> <p dir="ltr"><span class="smaller">text1</span><br /><span >> class="smaller">text2 text3.</span><br /><span >> class="smalleritalic">no</span><br /><span >> class="smaller">problems.</span><br /><br /></p> >> >> The desired one is: >>>> >>>> <p dir="ltr"><span class="smaller">text1</span> >>>> B B B B B B <br /> >>>> B B B B B B <span class="smaller">text2 text3.</span> >>>> B B B B B B <br /> >>>> B B B B B B <span class="smalleritalic">no</span> B <span >>>> class="smaller">problems.</span> >>>> B B B B B B <br /> >>>> B B B B B B <br /> >>>> B B B B B B </p>
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] How to split text element, Imsieke, Gerrit, le- | Thread | [xsl] Re: How to split text element, Mark Howe |
RE: [xsl] display & as text, List Owner | Date | [xsl] Controlling attributes and xs, Nick Leaton |
Month |