RE: [xsl] hard xsl problem

Subject: RE: [xsl] hard xsl problem
From: "Sal Mangano" <smangano@xxxxxxxxxx>
Date: Mon, 26 Jul 2004 00:44:26 -0400
This problem is hard because it is really a text processing problem
disguised as an xml/xslt problem. The first observation is to notice that if
the parenthesized text was an element called <paren> ... </paren> then the
problem would become trivial since one would write a transformation that
discarded paren elements with children cite (e.g., paren[cite]) but
converted paren elements without children cite to ( ... ).

So the easiest solution would be if you can preprocess the text in perl
replacing ( with <paren> and ) with </paren> and then doing the trivial xslt
transform. However, maybe you can't use Perl or some other language. So, you
can do the same in XSLT, in a few passes. The passes can be combined into a
single stylesheet if you use node-set() extension with variables in 1.0 or
uses xslt 2.0. (BTW, if you can use 2.0 then things get cleaner then what I
show below but the strategy is the same).

Pass 1 is to:
	 replace all '(' with the empty tag <open-paren/>
	 replace all ')' with the empty tag <close-paren/>

Something like:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>

  <xsl:template match="/">
    <xsl:apply-templates mode="pass1"/>
  </xsl:template>


  <xsl:template match="@* | *" mode="pass1">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()" mode="pass1"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="text()" mode="pass1">
    <xsl:call-template name="paren-to-elem"/>
  </xsl:template>

  <xsl:template name="paren-to-elem"><xsl:param name="text"
select="current()"/><xsl:choose><xsl:when test="contains($text, '(' ) and
not( contains($text,')' ) )"><xsl:value-of select="substring-before($text,
'(' )"/><open-paren/><xsl:value-of select="substring-after($text, '('
)"/></xsl:when><xsl:when test="contains($text, ')' ) and not(
contains($text,'(' ) )"><xsl:value-of select="substring-before($text, ')'
)"/><close-paren/><xsl:value-of select="substring-after($text, ')'
)"/></xsl:when><xsl:when test="contains($text, '(' ) and contains($text, ')'
) and string-length(substring-before($text, '(' )) &lt;
string-length(substring-before($text, ')' ))"><xsl:value-of
select="substring-before($text, '(' )"/><open-paren/><xsl:call-template
name="paren-to-elem">
          <xsl:with-param name="text" select="substring-after($text, '('
)"/>
        </xsl:call-template></xsl:when>
      <xsl:when test="contains($text, '(' ) and contains($text,')' )
"><xsl:value-of select="substring-before($text, ')'
)"/><close-paren/><xsl:call-template name="paren-to-elem">
          <xsl:with-param name="text" select="substring-after($text, ')'
)"/>
        </xsl:call-template></xsl:when>
      <xsl:otherwise><xsl:value-of select="$text"/></xsl:otherwise>
    </xsl:choose></xsl:template>

<xsl:template name="substring-before-last">
  <xsl:param name="input"/>
  <xsl:param name="substr"/>

  <xsl:if test="$substr and contains($input, $substr)">
    <xsl:variable name="temp" select="substring-after($input, $substr)" />
    <xsl:value-of select="substring-before($input, $substr)" />
    <xsl:if test="contains($temp, $substr)">
      <xsl:value-of select="$substr" />
      <xsl:call-template name="substring-before-last">
        <xsl:with-param name="input" select="$temp" />
        <xsl:with-param name="substr" select="$substr" />
      </xsl:call-template>
    </xsl:if>
  </xsl:if>

</xsl:template>

</xsl:stylesheet>

The output would look like:


<Paragraph> On October 30, clad in scarlet and ermine, Charles made his
entry into the papal palace <open-paren/>see <i>especially</i>
  <cite>30</cite>, as well as <cite>20</cite>  <close-paren/>. He presented
the Pope with a blue velvet cape embroidered in pearls <open-paren/>
  <cite>234</cite>; <cite>12345</cite><close-paren/> in a design of angels
<open-paren/>as well as a fleur-de-lys and stars<close-paren/>. With no
footing except in French support, Clement's papacy would have vanished in
smoke.</Paragraph>

BTW, in XSLT 2.0 you could use tokenize() and the conversion would be much
simpler.

Now we have a problem that can be solved by XSLT more easily but it still
isn't pretty cause the separate <open-paren/> and <close-paren/> tags don't
give us the structure we really want. But we can apply the following
transformation to fix that:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>

	<xsl:template match="Paragraph">
    <xsl:copy>
      <xsl:apply-templates select="open-paren | close-paren"/>
    </xsl:copy>
	</xsl:template>

	<xsl:template match="open-paren">
    <xsl:variable name="ns1"
select="following-sibling::close-paren[1]/preceding-sibling::node()"/>
    <xsl:variable name="ns2" select="following-sibling::node()"/>
    <xsl:copy-of select="preceding-sibling::node()"/>
    <paren>
    <xsl:copy-of select="$ns1[count(. | $ns2) = count($ns2)]"/>
    </paren>
	</xsl:template>

	<xsl:template match="open-paren[preceding-sibling::close-paren]">
    <xsl:variable name="ns1"
select="preceding-sibling::close-paren[1]/following-sibling::node()"/>
    <xsl:variable name="ns2" select="preceding-sibling::node()"/>
    <xsl:variable name="ns3"
select="following-sibling::close-paren[1]/preceding-sibling::node()"/>
    <xsl:variable name="ns4" select="following-sibling::node()"/>
    <xsl:copy-of select="$ns1[count(. | $ns2) = count($ns2)]"/>
    <paren>
    <xsl:copy-of select="$ns3[count(. | $ns4) = count($ns4)]"/>
    </paren>
	</xsl:template>

	<xsl:template
match="close-paren[not(following-sibling::open-paren)]">
    <xsl:copy-of select="following-sibling::node()"/>
	</xsl:template>

</xsl:stylesheet>

This transformation uses the XPATH 1.0 set intersection idiom. There are
other ways and in 2.0 you would approach this using grouping but I'll assume
you are stuck with 1.0.

Now we have the following output:

<Paragraph> On October 30, clad in scarlet and ermine, Charles made his
entry into the papal palace <paren>
    <cite>20</cite>, as well as <cite>30</cite>
    <i>especially</i>see </paren>. He presented the Pope with a blue velvet
cape embroidered in pearls <paren>
    <cite>12345</cite>; <cite>234</cite>
  </paren> in a design of angels <paren>as well as a fleur-de-lys and
stars</paren>. With no footing except in French support, Clement's papacy
would have vanished in smoke.</Paragraph>

Now it is easy. Simply write an identity transformation with:

<xsl:template match="paren[cite]"/> <!-- Eat these -->

<xsl:template match="paren">(<xsl:apply-templates/>)</sl:template>


So the problem is solved in 3 passes. Is this the best you can do in 1.0?  I
doubt it, but who has time!

I leave it to you to add the param switch you need.

-Sal

---------------------------------------------------------
Sal Mangano
Into Technology Inc.
www.into-technology.com

Use XSLT? Try the XSLT Cookbook
http://www.oreilly.com/catalog/xsltckbk/


> -----Original Message-----
> From: Richard Bondi [mailto:rbondi@xxxxxxxxx]
> Sent: Sunday, July 25, 2004 7:10 PM
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] hard xsl problem
>
>
> I would be grateful for a solution to the following xsl problem.
>
> Example input:
> ==============
> <Paragraph> On October 30, clad in scarlet and ermine,
> Charles made his entry into the papal palace (see
> <i>especially<i> <cite>30</cite>, as well as
> <cite>20</cite>). He presented the Pope with a blue velvet
> cape embroidered in pearls (<cite>234</cite>;
> <cite>12345</cite>) in a design of angels (as well as a
> fleur-de-lys and stars). With no footing except in French
> support, Clement's papacy would have vanished in smoke.</Paragraph>
>
> Example output:
> ==============
> <Paragraph> On October 30, clad in scarlet and ermine,
> Charles made his entry into the papal palace. He presented
> the Pope with a blue velvet cope embroidered in pearls in a
> design of angels (as well as a fleur-de-lys and stars). With
> no footing except in French support, Clement's papacy would
> have vanished in smoke.</Paragraph>
>
> Problem in words:
> ==============
> The <cite> tags are always enclosed in parenthesis. As the
> result of a transform (when a parameter passed into the xsl
> sheet is 'true') (a) these parens, (b) their xml content, and
> (c) the space preceding the open paren must be suppressed.
> Alternatively, (a)-(c) can be enclosed in a tag. Parens that
> do not contain <cite> tags are to be left as is. All of this
> is always inside a <Paragraph> tag.
>
> How to do this is the problem.
>
> I don't see how to do this even with a plugin, because it is
> unclear to me how a plugin can be used to generate tags. It
> would of course be more elegant to do this without a plugin.
>
> MTIA,
> /r:b:

Current Thread