Re: [xsl] HTML text extraction

Subject: Re: [xsl] HTML text extraction
From: Mukul Gandhi <mukul_gandhi@xxxxxxxxx>
Date: Mon, 26 Jul 2004 06:50:18 -0700 (PDT)
Hope this could help -

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>

<xsl:output method="xml" version="1.0"
encoding="UTF-8" indent="yes"/>
	
<xsl:template match="/root">
  <root>
    <xsl:for-each select="p[(. = 'Heading 1') or (. =
'Heading 2')]">
     <Subject>
       <xsl:value-of select="." />
       <xsl:text>&#xA;</xsl:text>
       <xsl:variable name="p-id"
select="generate-id()"/>
       <content>
	 <xsl:for-each
select="following-sibling::p[generate-id(preceding-sibling::p[starts-with(.
, 'Heading')][1]) = $p-id][not(starts-with(.,
'Heading'))]">				 	   <xsl:value-of select="."/>
	   <xsl:text>&#xA;</xsl:text>
	 </xsl:for-each>
       </content>
     </Subject>
   </xsl:for-each>
  </root>
</xsl:template>
	
</xsl:stylesheet>

Regards,
Mukul

--- Myron Bennet <vbj34@xxxxxxxxx> wrote:
> Hello,
> 
> I am using XSL to extract text from HTML pages into
> XML. I get all the text between predefined delimiter
> keywords such as Heading 1 and Heading 2. The
> problem
> I am having is the template continues matching past
> the delimiter keywords (For example I want to match
> between Headings 1 and 2 only, but the template
> matches between Headings 1-2 plus everything else
> after Heading 2). Example input/output and the
> recursive template I use are shown below. I would
> appreciate any input on this. Thanks.
> 
> 
> INPUT HTML:
> 
> <p>Heading 1</p>
> <p>bbb</p>
> <p>aaa</p>
> <p>Heading 2</p>
> <p>aaa</p>
> <p>ccc</p>
> <p>Heading 3</p>
> ...
> 
> 
> OUTPUT XML:
> 
> <Subject>
> Heading 1
> <content>
> bbb
> aaa
> </content>
> </Subject>
> <Subject>
> Heading 2
> <content>
> aaa
> ccc
> </content>
> </Subject>
> <Subject>
> Heading 3
> <content>
> ...
> </content>
> </Subject>
> 
> 
> RECURSIVE TEMPLATE:
> <xsl:template
>
match="//p[starts-with(normalize-space(.),'Heading')]">
> <Subject>
> <xsl:value-of select="."/>
> <content>
> <xsl:variable name="next"
>
select="following-sibling::*[not(starts-with(normalize-space(.),
> 'Heading'))]"/>
> <xsl:if test="$next">
> <xsl:apply-templates select="$next"
> mode="getContent"
> />
> </xsl:if>                    
> </content>          
> </Subject>
> </xsl:template> 
>         
> <xsl:template name="getContent">
> <xsl:value-of select="."/>
> <xsl:variable name="next"
>
select="following-sibling::*[not(starts-with(normalize-space(.),
> 'Heading'))]"/>            
> <xsl:if test="$next">
> <xsl:apply-templates select="$next"
> mode="getContent"
> />                           
> </xsl:if>            
> </xsl:template>



		
__________________________________
Do you Yahoo!?
Yahoo! Mail Address AutoComplete - You start. We finish.
http://promotions.yahoo.com/new_mail 

Current Thread