Re: [xsl] Finding list items in XHTML

Subject: Re: [xsl] Finding list items in XHTML
From: Chris Loschen <loschen@xxxxxxxxxxxxx>
Date: Tue, 12 Nov 2002 17:43:31 -0500
Thank you very much for your help!

I'm trying to tackle the <p> -> <li> problem first, since that seemed to be easier. However, I don't seem
to have it right yet. Here's the stylesheet as it currently exists:


*****

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>

<xsl:output method="xml" version="1.0" encoding="us-ascii" omit-xml-declaration="no" doctype-public="+//ISBN 0-9673008-1-9//DTD OEB 1.0.1 Document//EN" doctype-system="oebdoc101.dtd" indent="no" />

<xsl:template match="p[starts-with(.,'&#10148; ')]">
<li><xsl:apply-templates /></li>
</xsl:template>

<xsl:template match="p[starts-with(.,'&amp;(!!char1!!); ')]">
<li><xsl:apply-templates /></li>
</xsl:template>

<xsl:template match="span[.='&#10148; ']" />

<xsl:template match="span[.='&amp;(!!char1!!); ']" />

<!-- The Identity Transformation -->
  <!-- Whenever you match any node or any attribute -->
  <xsl:template match="node()|@*">
    <!-- Copy the current node -->
    <xsl:copy>
      <!-- Including any attributes it has and any child nodes -->
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

***

I need the output to be encoded as us-ascii because some of the downstream tools are expecting
Unicode character references rather than pure UTF-8.


Unfortunately, it doesn't seem to be finding the nodes I want, either the <p> or the <span> elements.
Might it be the Unicode reference or the entity that's causing the problem? From what I can figure out,
the syntax looks to be OK, but perhaps I'm wrong.


Here's a sample of the input XML:

***

<p class="hang-text-6"><span class="hang-text-2">&#10148; </span>Everyone in a company should have a written job description that accurately
reflects their responsibilities and is related to their compensation.</p>
<a id="page-059"></a>


<p class="hang-text-3"><span class="hang-text-2">&amp;(!!char1!!); </span>HR can help you deal with problem employees. Because most companies fear
wrongful termination suits, problem employees often require delicate handling.
Know your firm&rsquo;s procedures for these situations and work closely with HR to
resolve them quickly.</p>


***

This is unchanged by the transformation. Do you see the error of my ways?

Perhaps I just need to look at it with fresh eyes in the morning. Thanks again for your help.


At 03:02 PM 11/12/02, you wrote:
Hi Chris,

At 12:45 PM 11/12/2002, you wrote:
My input (and output) is essentially XHTML (actually OEB, but they're almost identical). It has
a series of <p> elements, and in two specific cases, I need to convert the <p> elements
to <li> elements. Those cases are:


(1) when the <p> element starts with a <span> with the contents "&#10148; " (yes, that's a character reference), and
(2) when the <p> element starts with a <span> with the contents "&amp;(!!char1!!); " (that string exactly)


In both of these cases, I need to replace the <p> element with an <li> element
and delete the child <span> element entirely.

This is tractable.


Understand, first, that your solution is in XPath, not XSLT. That is, your code may be able to use (for example) either a template-driven approach, or a for-each iteration or even other techniques; but either way you'll need XPath to do your testing.

Since you're wisely starting with an identity template, and since it's probably the best solution in any case, we'll assume a template-driven approach.

As you know, you can match <p> elements with a template. You can also qualify the match. So

<xsl:template match="p[starts-with(.,'&#10148;')]">
  <li>
    <xsl:apply-templates/>
  </li>
</xsl:template>

matches <p> elements that start with this character. Note the match succeeds irrespective of the presence or absence of a <span> element, since all that's being tested is the string value of the <p>, which includes any elements inside it. This may or may not be good enough. Note also that any <p> elements that don't start with this character will fail to match, and thus presumably will be picked up by the identity template.

If you wanted a stricter test, you could say, for example,

<xsl:template match="p[child::*[1]/self::span[.='&#10148;']]">

This template matches any <p> element whose first child element is a <span> element with value '&#10148;'.

You get the idea -- you need to be very precise on exactly how you want the test to work. The major gotcha to keep in mind here is the presence or absence of text node children of your <p>, especially any whitespace-only text nodes that are there only for formatting your source. (Come back and ask if this is mysterious or if the expressions you try aren't getting the results you anticipate.)

Deleting the child span is easy; just include the template

<xsl:template match="span[.='&#10148;']"/>

in your identity transform. (This expression matches any span whose string value is that character. The template, having matched such a span, does nothing with it, so it doesn't appear in your output.)

Perhaps the more difficult part of this is that I also would like to take a series of
such elements and surround the entire series with a <ul> </ul> element structure.

This is basically a grouping problem, and appears in the archive of this list (and the XSL FAQ) under various guises, such as introducing hierarchy into flat structures, etc. In your case this'll be much easier to do in a separate pass. (It can be done in one pass, but you need to understand the logic for the two tasks separately in any case.)


For that second level down (lists inside lists), the same basic techniques will work. If you break the processing into two passes (1. change <p> to <li>, 2. group <li> structures), make sure in pass 1. that the second-level <li> elements have some kind of attribute or other marker to distinguish them, so they can be grouped properly in the second pass to interpolate the correct hierarchy.

If these hints aren't enough to get you rolling, or if you need help with exactly how to write the XPath, come back and ask. (If asking about XPath and matching, show us your source so we can see about any of those pesky text nodes, etc.) But hopefully this will help you break the problems down.

Regards,
Wendell



======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================


XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list

--Chris


----------------------------------------------------------------------------------------
Texterity ~ XML and PDF ePublishing Services
----------------------------------------------------------------------------------------
Chris Loschen, XML Developer
Texterity, Inc.
144 Turnpike Road
Southborough, MA 01772 USA
tel: +1.508.804.3033
fax: +1.508.804.3110
email: loschen@xxxxxxxxxxxxx
http://www.texterity.com/


XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list



Current Thread