Re: [xsl] Handling Non Well conformed HTML content

Subject: Re: [xsl] Handling Non Well conformed HTML content
From: "Mukul Gandhi" <gandhi.mukul@xxxxxxxxx>
Date: Tue, 3 Oct 2006 21:14:38 +0530
Hi Senthil,
  Please try this stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>

<xsl:output method="html" indent="yes" />

<xsl:template match="/broadcast">
 <html>
   <xsl:apply-templates select="content_vars/content" />
 </html>
</xsl:template>

<xsl:template match="content">
 <xsl:variable name="temp1" select="translate(., '[]', '')" />
 <xsl:variable name="temp2" select="//*[not(*)][contains($temp1,
local-name())]" />
 <xsl:variable name="temp3"
select="local-name(//*[not(*)][contains($temp1, local-name())])" />
 <p>
  <xsl:value-of select="substring-before($temp1, $temp3)"
/><xsl:value-of select="$temp2" /><xsl:value-of
select="substring-after($temp1, $temp3)" />
 </p>
</xsl:template>

</xsl:stylesheet>

This when applied to XML:

<?xml version="1.0" encoding="UTF-8"?>
<broadcast>
<content_vars>
 <content name="subject"><html>Hello [[BUYERS_NAME]]</html></content>
 <content name="text">REF Order [WEB_ORDER_NUMBER]</content>
</content_vars>
<ORDER_FEED>
  <ORDER>
    <ORDER_HEADER>
      <BUYERS_NAME>Senthil</BUYERS_NAME>
      <WEB_ORDER_NUMBER>W12345</WEB_ORDER_NUMBER>
    </ORDER_HEADER>
  </ORDER>
</ORDER_FEED>
</broadcast>

Produces output:

<html>
 <p>Hello Senthil</p>
 <p>REF Order W12345</p>
</html>

This works, but I have applied a sort of brute force programming here
(with //*[not(*)]) !


On 10/3/06, Senthilkumaravelan Krishnanatham <senthil@xxxxxxxxx> wrote:

Hi Guys, I have typical issue in handling HTML content in XML document of the below structure and i want to replace the HTML template with the respective node element text. HTML is not well formed. For that matter we are doing base64 encode of the html content. Please provide any resolution for the same. The replacement content might be in any part of the document. Any suggestions are welcome.

Input content
<?xml version="1.0" encoding="UTF-8"?>
<broadcast>
 <content_vars>
  <content name="subject"><html>Hello [[BUYERS_NAME]]</html></
content><!--encoded-->
  <content name="text">REF Order [WEB_ORDER_NUMBER]</content><!--
encoded->
 </content_vars>

       <ORDER_FEED>
<ORDER>
<ORDER_HEADER>
<BUYERS_NAME>Senthil</BUYERS_NAME>
<WEB_ORDER_NUMBER>W12345<WEB_ORDER_NUMBER>
</ORDER_HEADER>
<!--Line Items-->
</ORDER>
</ORDER_FEED>
</broadcast>

XSLT I tried for the same
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/
Transform">

<xsl:output method="html" indent="yes" />

<xsl:template match="/broadcast">
       <xsl:apply-templates select="content_vars/content" />

</xsl:template>

<xsl:template match="content">

    <xsl:variable name="temp1" select="translate(., '[]', '')" />
    <xsl:variable name="temp2"

select="normalize-space(../following-sibling::*[contains($temp1,
local-name())])" />
    <xsl:variable name="temp3"
select="local-name(../following-sibling::*[contains($temp1,
local-name())])" />
    <xsl:value-of select="substring-before($temp1, $temp3)"
/><xsl:value-of select="$temp2" /><xsl:value-of
select="substring-after($temp1, $temp3)" />
</xsl:template>

</xsl:stylesheet>

Expected output
<html>
Hello Senthil
REF Order W12345
</html>

And I am getting unexpected
<html>
Hello BUYERS_NAME
REF Order WEB_ORDER_NUMBER
</html>
Let me know how do I tweak the code to work as desired.
Other part is how Do I handle NOT well formed HTML content to
consider as XML content.


Thank, Senthil --~--


--
Regards,
Mukul Gandhi

Current Thread