Re: [xsl] Building complex, hierarchical html datasets

Subject: Re: [xsl] Building complex, hierarchical html datasets
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Mon, 18 May 2009 12:14:48 -0400

The hard part of this is the analysis, i.e. determining how to go about "codify[ing] some property that determines whether an element is chunked into its own file" (as David puts it).

This is hard for two reasons. You want this property to be robust, that is to work well over your entire range of inputs (actual if not potential). Plus, you want it to work well in the results, i.e. make a well-designed, navigable web resource, which is reasonably transparent, accessible, efficient and balanced (makes chunks the right size, few or no "stub" chunks, etc.).

These factors are so immensely variable from one publication to another that you are not likely to find terribly good advice in general. You might find some good hints or ideas in the literature on "information architecture". However, since in the context of the web this literature is generally concerned with the organization and design of web resources as such, rather than of the information sets presented by those resources (XML, databases, source documents or what have you), even if you strike gold here, you're liable to get help with the second problem (the design of the result) more than the first (the specification of the mapping).

This being the case, I think you're back to a combination of requirements and document analysis. Document analysis because you're looking at the source data set (and how its represented and handled in markup), and requirements analysis because you're looking at what you want to do with it in its rendition on line.

While you're doing this, it's perhaps worth considering whether your solution has to be absolutely automated. Publishing systems may find it useful to allow authors or editors to include a flag to say whether a particular component should be split out (or not). XSLT can respect such flags, and if they are handled well, this can help to produce a more flexible and elegant result, at the cost of hand labor, since someone has to decide when and where to use this feature.

Another thing to keep in mind is that you can provide options to users. For example, your "law" element, instead of simply being displayed statically, might present options enabling a user *either* to expand their view on the page (and perhaps collapse it again) *or* to open the resource in a new page. Simple hypertext and even a very modest use of Javascript can be extremely powerful when leveraged with transformations behind them, and designs that would be difficult to realize by hand can become relatively trivial to build and maintain.

The problem is difficult enough that probably all that one can say about it definitively in general is that whatever you decide, XSLT is the right tool for this job. Implementing the solution is almost always easier than designing it, and there can be subtle tradeoffs.

Then too, one of the big benefits of XML/XSLT's separation of content from presentation is that you can usually (other things being equal) revisit this question and iterate the design. If you're building a web site by hand, redoing its organization can be very expensive. If you are using XSLT over a stable source format, you can try out a few different alternatives before deciding, make improvements over time, and so forth.


At 09:54 AM 5/18/2009, you wrote:
Not sure what help can be offered without a bit more context (eg a small
irregular input and some indication of how you want it chunked into

> Therefore structure is very difficult to predict and producing a
> routine for every variation that does (or could) exist would be very
> arduous and probably unreliable.

It's a basic feature of the design of XSLt that it should be able to
cope with such irregular input. match="law"  matches law elements
wherever they are, you don't need to know in advance all the possible
paths that are needed to reach such an element.

That said your chunking probably does depend on the position of the
element within a document, but its hard to offer any coding advice at
this generality.

something as simple as

<xsl:template match="law">
    usually law elements end up as their own file

<xsl:template match="*/*/*/*/law" priority="2">
 but deeply nested law elements are inlined into their
parent document    usually law elements end up as their own file

is probably too simple, but the idea is basically sound you just need to
codify some property that determines whether an element is chunked into
its own file a simple count on depth is most likely too simple, but
perhaps depth, and the parent element and an attribute or two might be


The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs.

Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.      
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
  Mulberry Technologies: A Consultancy Specializing in SGML and XML

Current Thread