Re: [xsl] Move elements to preceding parent

Subject: Re: [xsl] Move elements to preceding parent
From: Israel Viente <israel.viente@xxxxxxxxx>
Date: Thu, 18 Jun 2009 18:15:26 +0300
Hi Ken,

I tried to test the stylesheet with non-<p> elements inside body and I
see they break the paragraph.

Input Example:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml";>
<body>
   <p dir="rtl">
      <span class="chapter">line1</span>
   </p>
 <p dir="rtl"><span class="regular">line10</span>
 <span class="regular">line11</span>
 </p>
 <p dir="rtl"><span class="regular">line12</span>
 </p>
<p dir="rtl"><span class="regular">line13.</span>
</p>
<p dir="rtl"><span class="regular">line14</span>
</p>
<p dir="rtl"><span class="regular">line15</span>
</p>
<h5>
    <img src="images/test.jpg" width="35.00" height="30.00" alt="images.jpg"
/>

</h5>
<p dir="rtl"><span class="regular">line16.</span>
</p>
<p dir="rtl"><span class="regular">line17"</span>
</p>

</body>
</html>

Output:
<?xml version="1.0" encoding="UTF-8"?><html
xmlns="http://www.w3.org/1999/xhtml";>
   <body>
      <p dir="rtl">
         <span class="chapter">line1</span>

      </p>
      <p dir="rtl"><span class="regular">line10</span>
         <span class="regular">line11</span>
         <span class="regular">line12</span>
         <span class="regular">line13.</span>

      </p>
      <p dir="rtl"><span class="regular">line14</span>

      </p>
      <p dir="rtl"><span class="regular">line15</span>

      </p>
      <h5>
         <img src="images/test.jpg" width="35.00" height="30.00"
alt="images.jpg" />


      </h5>
      <p dir="rtl"><span class="regular">line16.</span>

      </p>
      <p dir="rtl"><span class="regular">line17"</span>

      </p>
   </body>
</html>


I thought the <h5> element should be grouped as a seperate group
because of the condition group-ending-with="*[not(self::p)] ...

What should I change so the output will be:
<?xml version="1.0" encoding="UTF-8"?><html
xmlns="http://www.w3.org/1999/xhtml";>
   <body>
      <p dir="rtl">
         <span class="chapter">line1</span>

      </p>
      <p dir="rtl"><span class="regular">line10</span>
         <span class="regular">line11</span>
         <span class="regular">line12</span>
         <span class="regular">line13.</span>

      </p>
      <p dir="rtl"><span class="regular">line14</span>

		<span class="regular">line15</span>
		<span class="regular">line16.</span>
      </p>
      <h5>
         <img src="images/test.jpg" width="35.00" height="30.00"
alt="images.jpg" />


      </h5>

      <p dir="rtl"><span class="regular">line17"</span>

      </p>
   </body>
</html>

Thanks, Israel


On Wed, Jun 17, 2009 at 11:37 PM, G. Ken
Holman<gkholman@xxxxxxxxxxxxxxxxxxxx> wrote:
> At 2009-06-17 23:11 +0300, Israel Viente wrote:
>>
>> I really appreciate your code and comments, but after reading it many
>> times, I can't reach to the bottom of the logic here.
>> I'm a newbie so forgive my stupid questions.
>
> As I tell my students, questions are not stupid if they are asked
sincerely.
>  I far more appreciate the asking of questions than the ignoring of working
> code that was supplied as requested.
>
>> 1. Why do we need the outer most copy element:
>> > <xsl:template match="body">
>> >  <xsl:copy>
>
> In order to preserve the body element when it comes time to group the
> paragraphs.
>
>> How does it work in combination with xsl:for-each-group?
>
> By being the parent of the elements being grouped, matching on <body> gives
> the stylesheet the opportunity to act on all of the children of body.  The
> paragraphs you want to massage are children of the body, so the time to act
> on those children is at the time the body arrives at the stylesheet.  Since
> we want the body element to be part of the result, we preserve it with
> <xsl:copy>.
>
>> 2. Can you please explain the group-ending-with selection?
>
> You can see by the select="*" that I have selected *all* of the children of
> the body.  I want to act on those groups of adjacent <p> elements.  But
> since there are other non-<p> elements that could be in the data (there
> aren't any in your data, but how often is a web page made solely of
> paragraphs?) I would be pulling those into the selection as well.  After
> all, I want all of the children of body to be processed in child order, I
> only want to engage the special handling when I'm dealing with those
> children that are paragraphs.
>
> Yes your data sample only contained paragraphs, but I try to write my
> stylesheets defensively anticipating other conditions.
>
>> Why do we need *[not(self::p)] ? Doesn't it mean all except p elements?
>
> Indeed it does mean all except <p> elements.  By putting non-<p> elements
in
> their own group, they won't interfere with the groups that are comprised of
> <p> elements.
>
> So, adding more narrative to the stylesheet:
>
>>  <xsl:template match="body">
>>  <xsl:copy>
>>    <xsl:copy-of select="@*"/>
>
> The above preserves the body element and any attributes that might be
> attached to it.
>
>>    <xsl:for-each-group select="*"
>
> The above selects all of the children of the body.
>
>>                        group-ending-with="*[not(self::p)] |
>>                                           p[span/@class='chapter'] |
>>                                           p[matches(span[last()],
>>                                                     '[.?&#x22;]$')]">
>
> The above creates a group for every non-paragraph, a group for every
> chapter, and a group for every consecutive sequence of paragraphs and ends
> that group with a paragraph with the desired punctuation.
>
>>      <!--now the information is grouped by p elements that end as
>>  required-->
>>      <xsl:choose>
>>        <xsl:when test="current-group()[last()]
>>                        [self::p][matches(span[last()],'[.?&#x22;]$')]">
>
> The above tells me when I have encountered a group of <p> elements that
ends
> with a paragraph with the desired punctuation.
>
>>          <!--in a group of p elements that end as required-->
>>          <xsl:copy>
>>            <xsl:copy-of select="@*"/>
>
> The above preserves the *first* of those paragraphs, and its attributes.
>
>>            <!--preserve the content of the first of these p elements-->
>>            <xsl:apply-templates/>
>
> The above preserves the content of that paragraph.
>
>>            <!--preserve only the span elements and indentation from the
>>  rest;
>>                (the indentation is needed because this is paragraph
>>                 white-space)-->
>>            <xsl:apply-templates select="current-group()[position()>1]/
>>                                         (text()[not(normalize-space())] |
>>                                         span)"/>
>
> The above preserves only the content of the other paragraphs in the group.
>  If there are no other paragraphs in the group, nothing else is added.  If
> there are 15 other paragraphs in the group, all of the content of all of
> them are added.  This is the generalized nature of the result:  I'm not
> assuming that there is only one other paragraph.
>
>>          </xsl:copy>
>>        </xsl:when>
>>        <xsl:otherwise>
>>          <!--in another kind of group so just copy these using identity-->
>>          <xsl:apply-templates select="current-group()"/>
>
> The above preserves all of the children of <body> that are not paragraphs
or
> are chapter paragraphs.
>
>>        </xsl:otherwise>
>>      </xsl:choose>
>>    </xsl:for-each-group>
>>  </xsl:copy>
>>  </xsl:template>
>
> I hope this has helped.  Working directly with the sibling axes is fraught
> with problems because of the reach of these axes usually past where we want
> to stop looking.  By looking *down* on the data, rather than left and
right,
> one can see a different perspective of your requirement.  You expressed
your
> requirement by looking left and right from the given paragraph.  I
expressed
> your requirement by looking down at the paragraphs from the <body> parent.
>
> Good luck in your work with XML and XSLT!  As you learn more I'm sure
you'll
> love it more.
>
> . . . . . . . . . . . . Ken
>
> --
> Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/s/
> Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video
> Video lesson:    http://www.youtube.com/watch?v=PrNjJCh7Ppg&fmt=18
> Video overview:  http://www.youtube.com/watch?v=VTiodiij6gE&fmt=18
> G. Ken Holman                 mailto:gkholman@xxxxxxxxxxxxxxxxxxxx
> Male Cancer Awareness Nov'07  http://www.CraneSoftwrights.com/s/bc
> Legal business disclaimers:  http://www.CraneSoftwrights.com/legal

Current Thread