Re: [xsl] remove tags + CDATA tag out of big xml file

Subject: Re: [xsl] remove tags + CDATA tag out of big xml file
From: Andrew Welch <andrew.j.welch@xxxxxxxxx>
Date: Mon, 1 Feb 2010 15:24:15 +0000
How about:

<xsl:template match="content">
  <xsl:analyze-string select="." regex="&lt;.*?&gt;">
    <xsl:non-matching-substring>
      <xsl:value-of select="."/>
    </xsl:non-matching-substring>
  </xsl:analyze-string>
</xsl:template>

which when applied to:

<content><![CDATA[
<p>The <strong>keyword</strong> is nice to have but is not needed to
 include in a solr feed</p>]]></content>

give this:

The keyword is nice to have but is not needed to
 include in a solr feed

cheers
andrew


On 1 February 2010 14:06, bw <bwakkie@xxxxxxxxx> wrote:
> Hi Michael,
>
> This is exactly why I want to remove it ;-). I was even thinking about
> some fancy perl script command to remove it now.
>
> On 29/01/2010, Michael Ludwig <milu71@xxxxxx> wrote:
>> bw schrieb am 29.01.2010 um 12:02:10 (+0100):
>>> Hello,
>>>
>>> I have a big xml feed out of my content management system that
>>> includes wysiwyg html tags inside CDATA tags.
>>>
>>> I am looking for a way to remove the CDATA and only get the text.
>>
>>>          <content><![CDATA[
>>> <p>The <strong>keyword</strong> is nice to have but is not needed to
>>> include in a solr feed</p> ...
>>
>> Looks like this feed is for Solr (an indexer), which won't do anything
>> useful with the markup anyway. Someone has defined <title> and <content>
>> as fields for the indexer but has forgotten to strip the markup from the
>> source. That source markup in CDATA has no purpose in a feed for Solr
>> and should not have been included in the first place.
>>
>> --
>> Michael Ludwig
>>
>>
>
>
> --
> [Bb](astia{2}n)?\s?[Ww](ak{2}ie)?$
>
>



--
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/

Current Thread