Subject: [xsl] HTML text extraction From: Myron Bennet <vbj34@xxxxxxxxx> Date: Sun, 25 Jul 2004 22:23:15 -0700 (PDT) |
Hello, I am using XSL to extract text from HTML pages into XML. I get all the text between predefined delimiter keywords such as Heading 1 and Heading 2. The problem I am having is the template continues matching past the delimiter keywords (For example I want to match between Headings 1 and 2 only, but the template matches between Headings 1-2 plus everything else after Heading 2). Example input/output and the recursive template I use are shown below. I would appreciate any input on this. Thanks. INPUT HTML: <p>Heading 1</p> <p>bbb</p> <p>aaa</p> <p>Heading 2</p> <p>aaa</p> <p>ccc</p> <p>Heading 3</p> ... OUTPUT XML: <Subject> Heading 1 <content> bbb aaa </content> </Subject> <Subject> Heading 2 <content> aaa ccc </content> </Subject> <Subject> Heading 3 <content> ... </content> </Subject> RECURSIVE TEMPLATE: <xsl:template match="//p[starts-with(normalize-space(.),'Heading')]"> <Subject> <xsl:value-of select="."/> <content> <xsl:variable name="next" select="following-sibling::*[not(starts-with(normalize-space(.), 'Heading'))]"/> <xsl:if test="$next"> <xsl:apply-templates select="$next" mode="getContent" /> </xsl:if> </content> </Subject> </xsl:template> <xsl:template name="getContent"> <xsl:value-of select="."/> <xsl:variable name="next" select="following-sibling::*[not(starts-with(normalize-space(.), 'Heading'))]"/> <xsl:if test="$next"> <xsl:apply-templates select="$next" mode="getContent" /> </xsl:if> </xsl:template> __________________________________ Do you Yahoo!? New and Improved Yahoo! Mail - 100MB free storage! http://promotions.yahoo.com/new_mail
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Any samples of client-sid, David Carlisle | Thread | Re: [xsl] HTML text extraction, Mukul Gandhi |
[xsl] Any samples of client-side XS, Daniel Joshua | Date | [xsl] Can one Use JavaScript to upd, David Frette |
Month |