Subject: RE: [xsl] Multiple CDATA tags...again From: "Aron Bock" <aronbock@xxxxxxxxxxx> Date: Tue, 10 May 2005 02:20:46 +0000 |
<data> <Field outputName="TEXT"> 2010 &quot;We respectfully Wish the health of the great leader [yo'ndude] Comarade Big John Il </Field> </data>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="xml" omit-xml-declaration="no" indent="yes" cdata-section-elements="TEXT" />
<xsl:template match="Field"> <xsl:if test="contains ('TEXT', @OutputName)"> <xsl:element name="{@OutputName}"> <xsl:copy-of select="."/> </xsl:element> </xsl:if> </xsl:template> </xsl:stylesheet>
<?xml version="1.0" encoding="UTF-8"?> <Field outputName="TEXT"> 2010 &quot;We respectfully Wish the health of the great leader [yo'ndude] Comarade Big John Il </Field>
From: mylistaddress@xxxxxxxxxx Reply-To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Subject: RE: [xsl] Multiple CDATA tags...again Date: Mon, 09 May 2005 18:02:41 -0700 (PDT)
Hi, Thanks for responding. I am pretty much ready to throw myself off of a bridge...but I guess I can't complain about learning on the job.
OK, here's the deal. I am sending XML requests via Java 1.4 to a library DB called STAR XML (made by Cuadra) which sends back a very verbose XML response of a news item. I have no control over the format of the output. I was able to make sense out of it (thanks to your responses) and transform it into a format more acceptable to the Verity search indexing spider.
When the output from STAR XML is HTML, the < and > tags are converted to < and > and so on. Oddly it appears to also convert a quote as &quot; instead of ". When I try to index the resulting XML document without placing CDATA tags (not really a tag, right?) around the content, the indexer fails. The content also contains [ and ] and non english text.
So, I added the cdata-section-elements declaration to my xsl:output and this is when I encountered the multiple cdata tags. At first i suspected they appeared wherever there is a line-break, but this does not appear to be the case.
Here is a portion of the XML response from STAR XML: <Field outputName="TEXT"> 2010 &quot;We respectfully Wish the health of the great leader [yo'ndude] Comarade Big John Il </Field>
Here is a portion of the XSL dealing with the TEXT element: <xsl:output method="xml" omit-xml-declaration="no" indent="yes" cdata-section-elements="TEXT" /> <xsl:strip-space elements="*" /> ... <xsl:template match="Field"> <xsl:if test="contains ('TEXT', @OutputFieldName)"> <xsl:element name="{@OutputFieldName}"> <xsl:apply-templates/> </xsl:if> </xsl:template>
Resulting XML: <TEXT> <![CDATA[2010 "We ]]><![CDATA[ Respectfully Wish Hea]]><![CDATA[lth of the great leader ]]><![CDATA[ [yo'ndude] Brother ]]><![CDATA[ Big John Il] ]]> </TEXT>
As you can see, the CDATAs are appearing all over the place. This is just a small clip. The actual doc has dozens. Also notice how the " (no more & before the quot;) appear now. Do I have to transform them again? My literal [ and ] are intact.
I visited dpawson.co.uk and read up on the doe stuff, but am still stuck. Could anyone recommend a book? XSLT cookbook? I borrowed the O'reiley XML hack (and noticed your name) but it is slim on xsl.
Thanks so much for any help.
JW
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Multiple CDATA tags...aga, mylistaddress | Thread | RE: [xsl] Multiple CDATA tags...aga, Michael Kay |
RE: [xsl] Two-level grouping proble, atlstjohn@xxxxxxxxx | Date | Re: [xsl] RSS feeds and disable-out, Julian Reschke |
Month |