RE: [xsl] Multiple CDATA tags...again

Subject: RE: [xsl] Multiple CDATA tags...again
From: mylistaddress@xxxxxxxxxx
Date: Mon, 09 May 2005 18:02:41 -0700 (PDT)
Hi,
Thanks for responding. I am pretty much ready to throw
myself off of a bridge...but I guess I can't complain
about learning on the job.

OK, here's the deal. I am sending XML requests via Java
1.4 to a library DB called STAR XML (made by Cuadra)
which sends back a very verbose XML response of a news
item. I have no control over the format of the output.
I was able to make sense out of it (thanks to your
responses) and transform it into a format more
acceptable to the Verity search indexing spider.

When the output from STAR XML is HTML, the < and > tags
are converted to &lt; and &gt; and so on. Oddly it
appears to also convert a quote as &amp;quot; instead
of &quot;. When I try to index the resulting XML
document without placing CDATA tags (not really a tag,
right?) around the content, the indexer fails.
The content also contains [ and ] and non english text.

So, I added the cdata-section-elements declaration to
my xsl:output and this is when I encountered the
multiple cdata tags. At first i suspected they appeared
wherever there is a line-break, but this does not
appear to be the case. 

Here is a portion of the XML response from STAR XML:
<Field outputName="TEXT">
2010 &amp;quot;We
respectfully Wish the health of the great leader
[yo&apos;ndude] Comarade Big John Il 
</Field>

Here is a portion of the XSL dealing with the TEXT
element:
<xsl:output method="xml" omit-xml-declaration="no"
indent="yes" cdata-section-elements="TEXT" />
<xsl:strip-space elements="*" />
...
<xsl:template match="Field">
<xsl:if test="contains ('TEXT', @OutputFieldName)">
<xsl:element name="{@OutputFieldName}">
<xsl:apply-templates/>
</xsl:if>
</xsl:template>

Resulting XML:
<TEXT>
<![CDATA[2010 &quot;We
     ]]><![CDATA[       Respectfully Wish
Hea]]><![CDATA[lth of the great leader
    ]]><![CDATA[      [yo'ndude] Brother ]]><![CDATA[  
Big John Il]      ]]>
</TEXT> 

As you can see, the CDATAs are appearing all over the
place. This is just a small clip. The actual doc has
dozens. Also notice how the &quot; (no more &amp;
before the quot;) appear now. Do I have to transform
them again? My literal [ and ] are intact.

I visited dpawson.co.uk and read up on the doe stuff,
but am still stuck. Could anyone recommend a book? XSLT
cookbook? I borrowed the O'reiley XML hack (and noticed
your name) but it is slim on xsl.

Thanks so much for any help.

JW

Current Thread