Subject: RE: [xsl] Multiple CDATA tags...again From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Tue, 10 May 2005 11:28:23 +0100 |
This CDATA problem is odd but it's essentially a distraction. The root cause of your problem is that you're getting some very peculiar XML out of the database. I don't know it this is the fault of the database vendor - it's entirely possible that the rot started with the data that was put into the database in the first place. You should be trying to identify where the special characters such as ampersand got double-escaped, and fix the problem at its origin. Meanwhile, if you want to tidy up the rubbish that you're getting from the database, I would think a good start would be to get rid of the double-escaping using something like: <xsl:template match="text()"> <xsl:variable name="doc"> <x><xsl:copy-of select="."/></x> </xsl:variable> <xsl:value-of select="saxon:parse($doc)"/> </xsl:template> That's a Saxon-specific solution of course, but it's probably the easiest. Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: mylistaddress@xxxxxxxxxx [mailto:mylistaddress@xxxxxxxxxx] > Sent: 10 May 2005 02:03 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: RE: [xsl] Multiple CDATA tags...again > > Hi, > Thanks for responding. I am pretty much ready to throw > myself off of a bridge...but I guess I can't complain > about learning on the job. > > OK, here's the deal. I am sending XML requests via Java > 1.4 to a library DB called STAR XML (made by Cuadra) > which sends back a very verbose XML response of a news > item. I have no control over the format of the output. > I was able to make sense out of it (thanks to your > responses) and transform it into a format more > acceptable to the Verity search indexing spider. > > When the output from STAR XML is HTML, the < and > tags > are converted to < and > and so on. Oddly it > appears to also convert a quote as &quot; instead > of ". When I try to index the resulting XML > document without placing CDATA tags (not really a tag, > right?) around the content, the indexer fails. > The content also contains [ and ] and non english text. > > So, I added the cdata-section-elements declaration to > my xsl:output and this is when I encountered the > multiple cdata tags. At first i suspected they appeared > wherever there is a line-break, but this does not > appear to be the case. > > Here is a portion of the XML response from STAR XML: > <Field outputName="TEXT"> > 2010 &quot;We > respectfully Wish the health of the great leader > [yo'ndude] Comarade Big John Il > </Field> > > Here is a portion of the XSL dealing with the TEXT > element: > <xsl:output method="xml" omit-xml-declaration="no" > indent="yes" cdata-section-elements="TEXT" /> > <xsl:strip-space elements="*" /> > ... > <xsl:template match="Field"> > <xsl:if test="contains ('TEXT', @OutputFieldName)"> > <xsl:element name="{@OutputFieldName}"> > <xsl:apply-templates/> > </xsl:if> > </xsl:template> > > Resulting XML: > <TEXT> > <![CDATA[2010 "We > ]]><![CDATA[ Respectfully Wish > Hea]]><![CDATA[lth of the great leader > ]]><![CDATA[ [yo'ndude] Brother ]]><![CDATA[ > Big John Il] ]]> > </TEXT> > > As you can see, the CDATAs are appearing all over the > place. This is just a small clip. The actual doc has > dozens. Also notice how the " (no more & > before the quot;) appear now. Do I have to transform > them again? My literal [ and ] are intact. > > I visited dpawson.co.uk and read up on the doe stuff, > but am still stuck. Could anyone recommend a book? XSLT > cookbook? I borrowed the O'reiley XML hack (and noticed > your name) but it is slim on xsl. > > Thanks so much for any help. > > JW
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Multiple CDATA tags...aga, Aron Bock | Thread | RE: [xsl] Multiple CDATA tags...aga, mylistaddress |
RE: [xsl] Multiple CDATA tags...aga, mylistaddress | Date | RE: [xsl] Multiple CDATA tags...aga, mylistaddress |
Month |