Subject: Re: [xsl] HTML text extraction From: Mukul Gandhi <mukul_gandhi@xxxxxxxxx> Date: Mon, 26 Jul 2004 06:50:18 -0700 (PDT) |
Hope this could help - <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:template match="/root"> <root> <xsl:for-each select="p[(. = 'Heading 1') or (. = 'Heading 2')]"> <Subject> <xsl:value-of select="." /> <xsl:text>
</xsl:text> <xsl:variable name="p-id" select="generate-id()"/> <content> <xsl:for-each select="following-sibling::p[generate-id(preceding-sibling::p[starts-with(. , 'Heading')][1]) = $p-id][not(starts-with(., 'Heading'))]"> <xsl:value-of select="."/> <xsl:text>
</xsl:text> </xsl:for-each> </content> </Subject> </xsl:for-each> </root> </xsl:template> </xsl:stylesheet> Regards, Mukul --- Myron Bennet <vbj34@xxxxxxxxx> wrote: > Hello, > > I am using XSL to extract text from HTML pages into > XML. I get all the text between predefined delimiter > keywords such as Heading 1 and Heading 2. The > problem > I am having is the template continues matching past > the delimiter keywords (For example I want to match > between Headings 1 and 2 only, but the template > matches between Headings 1-2 plus everything else > after Heading 2). Example input/output and the > recursive template I use are shown below. I would > appreciate any input on this. Thanks. > > > INPUT HTML: > > <p>Heading 1</p> > <p>bbb</p> > <p>aaa</p> > <p>Heading 2</p> > <p>aaa</p> > <p>ccc</p> > <p>Heading 3</p> > ... > > > OUTPUT XML: > > <Subject> > Heading 1 > <content> > bbb > aaa > </content> > </Subject> > <Subject> > Heading 2 > <content> > aaa > ccc > </content> > </Subject> > <Subject> > Heading 3 > <content> > ... > </content> > </Subject> > > > RECURSIVE TEMPLATE: > <xsl:template > match="//p[starts-with(normalize-space(.),'Heading')]"> > <Subject> > <xsl:value-of select="."/> > <content> > <xsl:variable name="next" > select="following-sibling::*[not(starts-with(normalize-space(.), > 'Heading'))]"/> > <xsl:if test="$next"> > <xsl:apply-templates select="$next" > mode="getContent" > /> > </xsl:if> > </content> > </Subject> > </xsl:template> > > <xsl:template name="getContent"> > <xsl:value-of select="."/> > <xsl:variable name="next" > select="following-sibling::*[not(starts-with(normalize-space(.), > 'Heading'))]"/> > <xsl:if test="$next"> > <xsl:apply-templates select="$next" > mode="getContent" > /> > </xsl:if> > </xsl:template> __________________________________ Do you Yahoo!? Yahoo! Mail Address AutoComplete - You start. We finish. http://promotions.yahoo.com/new_mail
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] HTML text extraction, Myron Bennet | Thread | [xsl] Can one Use JavaScript to upd, David Frette |
Re: [xsl] Converting a Batch File t, Jeni Tennison | Date | Re: [xsl] DOCTYPE causes appearance, Paul DuBois |
Month |