Re: [xsl] Transforming XML Blockquotes - Mixed Content

Subject: Re: [xsl] Transforming XML Blockquotes - Mixed Content
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Wed, 13 Apr 2005 17:11:52 -0400
Edward,

At 03:25 PM 4/13/2005, you wrote:
If I understand this correctly, the text between two block quotes would be treated the same as a simple plain paragraph. I don't think, however, this will work because I would lose the attributes of the xml tagged paragraphs. My source xml contains paragraphs that contain a number attribute, as well as other inline tagging for italics, footnotes, etc.

For example:

<paragraph num="1">Yadda Yadda Yadda <italic>Italic Yadda</italic> Yadda:
<blockquote>Blah Blah Blah Blah</blockquote>
Yackity Yack Yack</paragraph>

I read the previous posts under the subject line "Transformation problem tei > xhtml" but I also had trouble understanding either of the suggested solutions.

Right. And (sadly) some of us old-timers sighed when we saw the post, because we knew (a) the problem wasn't easy, and (b) most attempts to handle it deal only with a subset of the actual problem (as you have noticed). This is usually okay in practice (often one only *has* a subset of the problem), but does require more fine-grained analysis of requirements, such as


* do the attributes on a p in the source get represented on new p elements created from the split, and if so how?
* do you have to account for any deeper nestings? Like, what happens if your blockquote happens inside an 'i' element. (Unlikely I know, but if one is trying to solve the general problem....)


Sorry, if I need a little too much hand holding but can someone explain how I would do this and how the suggested code would actually work?

Well, Mike actually did that already today, although in summary form.


As to a more generalized writeup, I'm sure several of us have contemplated that but haven't managed to do it, for various reasons.

Among these is the awareness that dealing with this pesky problem will be easier in XSLT 2.0.

I am basically trying to convert a basic document with standard features such as italic, footnotes, blockquotes, bullet lists etc. to XHTML. I would have thought that these basic issues would have been tackled numerous times by now. But every time I google these issues before posting questions to lists or forums, I can't find anything comparable. Most XML-related tutorials and books seem to assume use of more data-type information (without much mixed content), rather than document-type information.

I'm afraid the available literature may be misleading on these points.


"These basic issues" have indeed been tackled numerous times by now. You just happen to have identified about the single sorest point -- sorer even than multi-level grouping -- in XSLT 1.0. This isn't to say that the problem hasn't been solved -- it has: many of us have running code that deals with it. But the solutions are more cumbersome than we'd like, and more often than not, they're tied to particular document types (they know exactly what kind of tags to expect where) and/or markup conventions.

A deep consideration of why this is a hard problem would have to recognize, I think, how the "split a paragraph around a blockquote" issue is related to issues of handling overlapping phenomena in XML (which privileges a single clean hierarchy), since how and where to split elements also turns up there. And both are related to XSLT 1.0's relative shortcomings when it comes to upconversion problems in general, and why XSLT 2.0 grouping constructs are relevant -- since in all these problems, an essential aspect is recognizing how in

<p>Here's a text with a blockquote:
<blockquote>Here's my quote!</blockquote>
which is a quote, after which appears <i>inline markup</i>, after which another blockquote appears:
<blockquote>Here's my other blockquote</blockquote>
</p>


part of the problem is in grouping all the nodes that belong together (here, between the blockquotes we have two text nodes and an element) so they can be wrapped in a new p. (A standard approach to this problem is the "sibling walk" pattern that Mike mentioned.)

Interpolating implicit structure is not something XSLT 1.0 was ever very good at.

In short, I would be happy to learn more before posting if anyone can point me toward any XML resources for dealing with actual document text issues?

XSLT as such doesn't really make a distinction between "documentary text" and "data" issues. We do observe patterns of differentiation, such as the appropriateness of the "push" approach to XSLT handling documents, but nothing hard and fast.


Cheers,
Wendell


====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================

Current Thread