Re: [xsl] Finding an untagged ordered list

Subject: Re: [xsl] Finding an untagged ordered list
From: Michael Müller-Hillebrand <mmh@xxxxxxxxxxxxx>
Date: Wed, 14 Jan 2009 12:51:37 +0100
I love to play with Kernow's XSLT Playground feature and your case way
well prepared for that.

With the help of xsl:for-each-group and regular expressions your case
can be solved. I did not try to exclude the possibly  last P from the
OL and I did not check for the alphabetical order. You would have to
specify more clearly what should happen, but look at this:

<xsl:template match="root">
  <xsl:copy>
    <xsl:for-each-group select="*"
      group-adjacent="if (self::P and
      (self::P[matches(., '^[A-Z][.]')] or
      preceding-sibling::P[matches(., '^[A-Z][.]')]))
      then 0 else position()">
      <xsl:choose>
        <xsl:when test="current-grouping-key() = 0">
          <OL>
            <xsl:for-each-group select="current-group()"
              group-starting-with="P[matches(., '^[A-Z][.]')]">
              <xsl:choose>
                <xsl:when test="./self::P[matches(., '^[A-Z][.]')]">
                  <LI>
                    <xsl:apply-templates select="current-group()"
mode="join"/>
                  </LI>
                </xsl:when>
                <xsl:otherwise>
                  <xsl:apply-templates select="current-group()"/>
                </xsl:otherwise>
              </xsl:choose>
            </xsl:for-each-group>
          </OL>
        </xsl:when>
        <xsl:otherwise>
          <xsl:apply-templates select="."/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:for-each-group>
  </xsl:copy>
</xsl:template>

<xsl:template match="node()" mode="join">
  <xsl:apply-templates select="@*|node()"/>
  <xsl:value-of select="' '"/>
</xsl:template>

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>


It creates


<root>
   <H1>Heading</H1>
   <P>List Heading</P>
   <OL>
      <LI>A. One big sentence incorrectly placed in P tags </LI>
      <LI>B. Another long sentence spanning P tags. </LI>
      <LI>C. This should really be one long list item that spans
randomly. Hopefully I am some unrelated text. </LI>
   </OL>
</root>

Good luck!

- Michael



Am 13.01.2009 um 23:45 schrieb Graeme Kidd:

Hi everyone,
I am using XSLT 2.0 and I have three questions about an ordered list
in this format:
<root>
  <H1>Heading</H1>
  <P>List Heading</P>
  <P>A. One big</P>
  <P>sentence incorrectly</P>
  <P>placed in P tags</P>
  <P>B. Another long</P>
  <P>sentence spanning P tags.</P>
  <P>C. This should really</P>
  <P>be one</P>
  <P>long</P>
  <P>list item</P>
  <P>that spans randomly.</P>
  <P>Hopefully I am some unrelated text.</P>
</root>

Which I want converted to this format:
<root>
  <H1>Heading</H1>
  <P>List Heading</P>
  <OL>
      <LI>One big sentence incorrectly placed in P tags.</LI>
      <LI>Another long sentence spanning P tags.</LI>
      <LI>This should really be one long list item that spans
randomly.</LI>
  </OL>
  <P>Hopefully I am some unrelated text.</P>
</root>

Due to the original XML file being rather bad the list may not start
at A. Before I have been able to catch a numbered list just by
checking if the P tag starts with a number and its preceding sibling
does not, then when I am inside the list check if the next P tag
starts with a number a well. This list is different though.

1) I imagine you can check if the P tags starts with a letter by
doing something like this:
P[starts-with(translate(., 'vUppercaseChars_CONST',
'vUppercaseAChar_CONST'), 'A')]
How would you then check it begins with letter followed by a dot?

2) Is it possible to find a letter followed by a dot then check if
the next P node starts with the next letter of the alphabet followed
by a dot?

3) Is it possible to check if the next 10 P tags contain the next
letter of the alphabet plus a dot. Previously I have been able to
pick up lists no problem when they had a predictable pattern but
this one doesn't. I can only assume that the list ends after about
10 P tags or it finds a character in a previous position in the
alphabet or it hits some other tag that is not a P tag. I would end
the list item at the first full stop it found after the last P tag
that started with character plus a dot. Is something like this
possible in XSLT and if so how?

Thanks for your time,
Graeme



-- _______________________________________________________________ Michael M|ller-Hillebrand: Dokumentation Technology Adobe Certified Expert, FrameMaker Consulting and Training, FrameScript, XML/XSL, Unicode Blog [de]: http://cap-studio.de/

Current Thread