RE: [xsl] Collapsing run-on tag chains not working in saxon or xalan

Subject: RE: [xsl] Collapsing run-on tag chains not working in saxon or xalan
From: cknell@xxxxxxxxxx
Date: Mon, 01 Nov 2004 15:22:55 -0500
The difficulty may lie in the fact that MSXML is playing fast and loose with "node()". Keep in mind that "node()" and element are not synonymous. "node()" includes text-only nodes which do not have names, local or otherwise. Try replacing all instances of "following-sibling::node()[1]" with "preceding-sibling::element[1]" in your stylesheet and let us know if this isn't more to your liking.
-- 
Charles Knell
cknell@xxxxxxxxxx - email



-----Original Message-----
From:     Richard Bondi <rbondi@xxxxxxxxx>
Sent:     Mon, 1 Nov 2004 14:05:31 -0500
To:       xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject:  [xsl] Collapsing run-on tag chains not working in saxon or xalan

Dear All,

With the following xml and xsl, the Microsoft msxmldom 4 is producing
the expected output, but xalan 2.4, 2.6, and saxon 6.5.3 are not: they
all produce
the same, unexpected output.

The purpose of this code is to collapse run-on chains like
<ilink>foo</link><link id="1234">bar</link> into a single tag
<link>foo bar<id id="1234"/>
</ilink>. The xsl will also collapse run-on chains of b, i, sup, sub,
and similar tags.

Can anyone explain to me whether xalan and saxon just have a bug, and
preferably how to get xalan and/or saxon to transform the way msxml4
does here
(which I believe is correct)?

TMIA,
Richard Bondi


Sample input:

<Chapter>
	<ChapterTitle>The chapter title must be immediately followed by a
section title</ChapterTitle>
	<Body>
		<SectionTitle>The section title</SectionTitle>
		<Title>Internal Links: _ilink</Title>
		<Paragraph>The internal link to Proteins and Membranes, optionally
including the cont_id would look like: <ilink id="1234">Proteins and
		Membranes</ilink>. You could also just type <ilink>Proteins and
Membranes</ilink>. Another option is <ilink>CBIO|Proteins and
Membranes</ilink>, or
		even just <ilink id="1234"/>. You can also do <ilink
id="1234">CBIO|Proteins and Membranes</ilink>. Spaces on either side
of a pipe (|) are
		optional.</Paragraph>
		<Paragraph>Feel free to include crazy formatting, as in <ilink>CBIO|</ilink>
			<ilink>
				<i>Proteins</i>
			</ilink>
			<ilink> and Membranes</ilink> or <ilink>
				<b>
					<i>Pr</i>
				</b>
			</ilink>
			<ilink>
				<sup>
					<b>
						<i>o</i>
					</b>
				</sup>
			</ilink>
			<ilink>
				<sub>
					<b>
						<i>t</i>
					</b>
				</sub>
			</ilink>
			<ilink>
				<b>
					<i>ei</i>
				</b>
			</ilink>
			<ilink>
				<b>
					<i>
						<u>n</u>
					</i>
				</b>
			</ilink>
			<ilink>
				<b>
					<i>s</i>
				</b>
			</ilink>
			<ilink id="1234">and Membranes</ilink>. </Paragraph>
	</Body>
</Chapter>


Xsl:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
	<xsl:output encoding="ISO-8859-1"/>
	<xsl:template match="/">
		<xsl:apply-templates/>
	</xsl:template>
	<!-- run of ilinks -->
	<xsl:template match="ilink">
		<xsl:if test="not(local-name(preceding-sibling::node()[1])='ilink')">
			<ilink>
				<xsl:if test="not(name(following-sibling::node()[1])='ilink')"><xsl:copy-of
select="@*"/></xsl:if>
				<xsl:apply-templates/>
				<xsl:if test="name(following-sibling::node()[1])='ilink'"><xsl:apply-templates
select="following-sibling::node()[1]" mode="following"/></xsl:if>
			</ilink>
		</xsl:if>
	</xsl:template>
	<xsl:template match="ilink" mode="following" >
		<xsl:apply-templates/>
		<xsl:if test="not(name(following-sibling::node()[1])='ilink') and
@*"><id><xsl:copy-of select="@*"/></id></xsl:if>
		<xsl:if test="name(following-sibling::node()[1])='ilink'"><xsl:apply-templates
select="following-sibling::node()[1]" mode="following"/></xsl:if>
	</xsl:template>
	<!-- run of formatting tags, eg tags without attributes -->
	<xsl:template match="b | i | sup | sub | u | smallcaps | red" priority="2">
		<xsl:variable name="ename" select="name(.)"/>
		<xsl:if test="not(local-name(preceding-sibling::node()[1])=string($ename))">
			<xsl:element name="{$ename}">
				<xsl:apply-templates/>
				<xsl:if test="name(following-sibling::node()[1])=string($ename)"><xsl:apply-templates
select="following-sibling::node()[1]" mode="following"/>
				</xsl:if>
			</xsl:element>
		</xsl:if>
	</xsl:template>
	<xsl:template match="b | i | sup | sub | u | smallcaps | red"
mode="following" >
		<xsl:variable name="ename" select="name(.)"/>
		<xsl:apply-templates/>
		<xsl:if test="name(following-sibling::node()[1])=string($ename)"><xsl:apply-templates
select="following-sibling::node()[1]" mode="following"/>
		</xsl:if>
	</xsl:template>
	<xsl:template match="@* | node()">
		<xsl:copy >
			<xsl:apply-templates select="@*" />
			<xsl:apply-templates />
		</xsl:copy>
	</xsl:template>
</xsl:stylesheet>


Output using msxml4 (correct output, IMHO):

<Chapter>
	<ChapterTitle>The chapter title must be immediately followed by a
section title</ChapterTitle>
	<Body>
		<SectionTitle>The section title</SectionTitle>
		<Title>Internal Links: _ilink</Title>
		<Paragraph>The internal link to Proteins and Membranes, optionally
including the cont_id would look like: <ilink id="1234">Proteins and
		Membranes</ilink>. You could also just type <ilink>Proteins and
Membranes</ilink>. Another option is <ilink>CBIO|Proteins and
Membranes</ilink>, or
		even just <ilink id="1234"/>. You can also do <ilink
id="1234">CBIO|Proteins and Membranes</ilink>. Spaces on either side
of a pipe (|) are
		optional.</Paragraph>
		<Paragraph>Feel free to include crazy formatting, as in
<ilink>CBIO|<i>Proteins</i> and Membranes</ilink> or <ilink>
				<b>
					<i>Pr</i>
				</b>
				<sup>
					<b>
						<i>o</i>
					</b>
				</sup>
				<sub>
					<b>
						<i>t</i>
					</b>
				</sub>
				<b>
					<i>ei</i>
				</b>
				<b>
					<i>
						<u>n</u>
					</i>
				</b>
				<b>
					<i>s</i>
				</b>and Membranes<id id="1234"/>
			</ilink>. </Paragraph>
	</Body>
</Chapter>


Output of xalan 2.4, 2.6.0, and instant saxon 6.5.3 (appears to do
nothing, actually):

<Chapter>
	<ChapterTitle>The chapter title must be immediately followed by a
section title</ChapterTitle>
	<Body>
		<SectionTitle>The section title</SectionTitle>
		<Title>Internal Links: _ilink</Title>
		<Paragraph>The internal link to Proteins and Membranes, optionally
including the cont_id would look like: <ilink id="1234">Proteins and
		Membranes</ilink>. You could also just type <ilink>Proteins and
Membranes</ilink>. Another option is <ilink>CBIO|Proteins and
Membranes</ilink>, or
		even just <ilink id="1234"/>. You can also do <ilink
id="1234">CBIO|Proteins and Membranes</ilink>. Spaces on either side
of a pipe (|) are
		optional.</Paragraph>
		<Paragraph>Feel free to include crazy formatting, as in <ilink>CBIO|</ilink>
			<ilink>
				<i>Proteins</i>
			</ilink>
			<ilink> and Membranes</ilink> or <ilink>
				<b>
					<i>Pr</i>
				</b>
			</ilink>
			<ilink>
				<sup>
					<b>
						<i>o</i>
					</b>
				</sup>
			</ilink>
			<ilink>
				<sub>
					<b>
						<i>t</i>
					</b>
				</sub>
			</ilink>
			<ilink>
				<b>
					<i>ei</i>
				</b>
			</ilink>
			<ilink>
				<b>
					<i>
						<u>n</u>
					</i>
				</b>
			</ilink>
			<ilink>
				<b>
					<i>s</i>
				</b>
			</ilink>
			<ilink id="1234">and Membranes</ilink>. </Paragraph>
	</Body>
</Chapter>

Current Thread