RE: [xsl] Multiple CDATA tags...again

Subject: RE: [xsl] Multiple CDATA tags...again
From: "Aron Bock" <aronbock@xxxxxxxxxxx>
Date: Tue, 10 May 2005 02:20:46 +0000
What processor are you using? With xalan, for the following XML:


<data> <Field outputName="TEXT"> 2010 &amp;quot;We respectfully Wish the health of the great leader [yo&apos;ndude] Comarade Big John Il </Field> </data>

By the applying the following XSL:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
       version="1.0">
   <xsl:output method="xml" omit-xml-declaration="no"
           indent="yes" cdata-section-elements="TEXT" />

   <xsl:template match="Field">
       <xsl:if test="contains ('TEXT', @OutputName)">
           <xsl:element name="{@OutputName}">
               <xsl:copy-of select="."/>
           </xsl:element>
       </xsl:if>
   </xsl:template>
</xsl:stylesheet>

I get this result:

<?xml version="1.0" encoding="UTF-8"?>
<Field outputName="TEXT">
2010 &amp;quot;We
respectfully Wish the health of the great leader
[yo'ndude] Comarade Big John Il
</Field>

Regards,

--A

From: mylistaddress@xxxxxxxxxx
Reply-To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: RE: [xsl] Multiple CDATA tags...again
Date: Mon, 09 May 2005 18:02:41 -0700 (PDT)

Hi,
Thanks for responding. I am pretty much ready to throw
myself off of a bridge...but I guess I can't complain
about learning on the job.

OK, here's the deal. I am sending XML requests via Java
1.4 to a library DB called STAR XML (made by Cuadra)
which sends back a very verbose XML response of a news
item. I have no control over the format of the output.
I was able to make sense out of it (thanks to your
responses) and transform it into a format more
acceptable to the Verity search indexing spider.

When the output from STAR XML is HTML, the < and > tags
are converted to &lt; and &gt; and so on. Oddly it
appears to also convert a quote as &amp;quot; instead
of &quot;. When I try to index the resulting XML
document without placing CDATA tags (not really a tag,
right?) around the content, the indexer fails.
The content also contains [ and ] and non english text.

So, I added the cdata-section-elements declaration to
my xsl:output and this is when I encountered the
multiple cdata tags. At first i suspected they appeared
wherever there is a line-break, but this does not
appear to be the case.

Here is a portion of the XML response from STAR XML:
<Field outputName="TEXT">
2010 &amp;quot;We
respectfully Wish the health of the great leader
[yo&apos;ndude] Comarade Big John Il
</Field>

Here is a portion of the XSL dealing with the TEXT
element:
<xsl:output method="xml" omit-xml-declaration="no"
indent="yes" cdata-section-elements="TEXT" />
<xsl:strip-space elements="*" />
...
<xsl:template match="Field">
<xsl:if test="contains ('TEXT', @OutputFieldName)">
<xsl:element name="{@OutputFieldName}">
<xsl:apply-templates/>
</xsl:if>
</xsl:template>

Resulting XML:
<TEXT>
<![CDATA[2010 &quot;We
     ]]><![CDATA[       Respectfully Wish
Hea]]><![CDATA[lth of the great leader
    ]]><![CDATA[      [yo'ndude] Brother ]]><![CDATA[
Big John Il]      ]]>
</TEXT>

As you can see, the CDATAs are appearing all over the
place. This is just a small clip. The actual doc has
dozens. Also notice how the &quot; (no more &amp;
before the quot;) appear now. Do I have to transform
them again? My literal [ and ] are intact.

I visited dpawson.co.uk and read up on the doe stuff,
but am still stuck. Could anyone recommend a book? XSLT
cookbook? I borrowed the O'reiley XML hack (and noticed
your name) but it is slim on xsl.

Thanks so much for any help.

JW

_________________________________________________________________
Dont just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/


Current Thread