Re: [xsl] (text processing) lexical context

Subject: Re: [xsl] (text processing) lexical context
From: Joerg Heinicke <joerg.heinicke@xxxxxx>
Date: Wed, 24 Apr 2002 09:13:57 +0200
Hello Nicolas,

<root>
This is the <w>first</w> <i>sentence</i>. This is the <w>second</w>
<i>sentence</i>. This is the <w>third</w> <i>sentence</i>.
</root>

you have really bad structured XML. Where should the processor know from, where a sentence ends and a new one starts? Can you always use '.' as marker?


I tried with a key-based solution (all nodes will be collected by the id of the text-node with the next '.'):

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>


<xsl:key name="sentences" match="node()" use="generate-id(following-sibling::text()[contains(., '.')][1])"/>

<xsl:template match="root">
<html>
<ol>
<xsl:apply-templates select="text()[contains(., '.')]" mode="end-of-sentence"/>
</ol>
</html>
</xsl:template>


<xsl:template match="text()" mode="end-of-sentence">
<li>
<xsl:apply-templates select="key('sentences', generate-id(.))" mode="rest-of-sentence"/>
<xsl:value-of select="substring-before(., '.')"/>
<xsl:text>.</xsl:text>
</li>
</xsl:template>


<xsl:template match="node()" mode="rest-of-sentence">
    <xsl:copy-of select="."/>
</xsl:template>

<xsl:template match="text()[contains(., '.')]" mode="rest-of-sentence">
    <xsl:value-of select="substring-after(., '.')"/>
</xsl:template>

</xsl:stylesheet>

The output with Xalan:

<html>
<ol>
<li>
This is the <w>first</w>
<i>sentence</i> without a comma.</li>
<li> This is the <w>second</w>

<i>sentence</i>.</li>
<li> This is the <w>third</w>
<i>sentence</i>.</li>
</ol>
</html>

I don't know whether the solution is perfect. It's a bit difficult to see any errors. But I would start with changing the terrible XML.

Regards,

Joerg



would be formatted so that the list would look like:

<html>
<ol>
<li>first: This is the <b>first</b> <i>sentence</i>. <li>Second: This is the <w>second</b> <i>sentence</i>. <li>Third: This is the <b>third</b> <i>sentence</i>.
</ol>
</html>


But I can't figure out how I can select the text surrounding the <w>
element without using <xsl:value-of.../>, which does not allow me to
process the following <i> element...

i.e., I get

<html>
<ol>
<li>first: This is the <b>first</b> sentence. <li>Second: This is the <w>second</b> sentence. <li>Third: This is the <b>third</b> sentence.
</ol>
</html>


and the <i> element is lost...

And I can't do <xsl template match="substring(...)"> because substring
is not a DOM node.

Help: is there a way to process substrings or stg?

N. Mazziotta


XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list


Current Thread