Re: [xsl] trouble with preceding:: and parsing xhtml

Subject: Re: [xsl] trouble with preceding:: and parsing xhtml
From: Chris Wolf <cw10025@xxxxxxxxx>
Date: Sun, 04 Oct 2009 00:59:14 -0400
Unfortunately, after I moved the application to Java (xalan, whatever is baked in
jdk-1.5.x) it still renders *some* nodes with preceding::div[@tid='field'][1] 
with the value of the first node, so with those, I tried flipping it by replacing 
"[1]" with "[last()]" again, but that hack only worked for some nodes.

Other then programmtically, the stylesheet works perfectly fine with 
"xsltproc" (MacOS/Linux) and "msxsl" on Windoze.

I also tried your Saxon-6.5.5 which works fine from the command line,
i.e. java -jar /opt/saxon-6.5.5/saxon.jar af.xhtml fbdata.xsl

...works.  Unfortunately, I get the same weird results when I replace
the default "javax.xml.transform.TransformerFactory" impl with
"com.icl.saxon.TransformerFactoryImpl".

Actually - saxon won't even read the xsl file unless I override and revert
the parser back to the built-in jdk (xerces) parser.  Unless, I do that,
I get:

	at com.icl.saxon.PreparedStyleSheet.prepare(PreparedStyleSheet.java:121)
	at com.icl.saxon.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:120)
	at com.icl.saxon.TransformerFactoryImpl.newTransformer(TransformerFactoryImpl.java:72)
	at com.starclass.ciafb.parser.FbParser.main(FbParser.java:49)
Caused by: java.io.EOFException: no more input
	at com.icl.saxon.aelfred.XmlParser.popInput(XmlParser.java:4083)
	at com.icl.saxon.aelfred.XmlParser.pushURL(XmlParser.java:3620)
	at com.icl.saxon.aelfred.XmlParser.doParse(XmlParser.java:159)
	at com.icl.saxon.aelfred.SAXDriver.parse(SAXDriver.java:320)
	at com.icl.saxon.om.Builder.build(Builder.java:265)
	at com.icl.saxon.PreparedStyleSheet.prepare(PreparedStyleSheet.java:111)
	... 3 more
---------
java.io.EOFException: no more input
	at com.icl.saxon.aelfred.XmlParser.popInput(XmlParser.java:4083)
	at com.icl.saxon.aelfred.XmlParser.pushURL(XmlParser.java:3620)
	at com.icl.saxon.aelfred.XmlParser.doParse(XmlParser.java:159)
	at com.icl.saxon.aelfred.SAXDriver.parse(SAXDriver.java:320)
	at com.icl.saxon.om.Builder.build(Builder.java:265)
	at com.icl.saxon.PreparedStyleSheet.prepare(PreparedStyleSheet.java:111)
	at com.icl.saxon.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:120)
	at com.icl.saxon.TransformerFactoryImpl.newTransformer(TransformerFactoryImpl.java:72)
	at com.starclass.ciafb.parser.FbParser.main(FbParser.java:49)


Overriding the parser to be "com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl"
fixes this, but the resulting transformation does not look anything like what
I get from that command line.

I'm using saxon-6.5.5 like this:

System.setProperty("javax.xml.transform.TransformerFactory", 
	"com.icl.saxon.TransformerFactoryImpl");
System.setProperty("javax.xml.parsers.SAXParserFactory", 
	"com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl");

HtmlCleaner cleaner = new HtmlCleaner();
TagNode result = cleaner.clean(new File(fin), "utf-8");
Document doc = new DomSerializer(cleaner.getProperties(), true).createDOM(result);

TransformerFactory tFactory = TransformerFactory.newInstance();
StreamSource ss = new StreamSource(xsl);
Transformer xform = tFactory.newTransformer(ss);
StringWriter sw = new StringWriter();
StreamResult sr = new StreamResult(sw);

xform.transform(new DOMSource(doc), sr);
sw.flush();
System.out.println(sw.toString());
	

BTW, when I ran saxon succesfully from the command line, I fed it a document
produced by HtmlCleaner, from the command line, via:
java -jar /opt/jlib/htmlcleaner2_1.jar src=countrytemplate_af.html dest=af.data outcharset=utf-8



Thanks,

  -Chris W.

Michael Kay wrote:
> You're nearly there: you want  
> 
> preceding::div[@tid='field'][1]
> 
> Without the [1], you select all of them throughout the document; and if you
> then use something like xsl:value-of (in XSLT 1.0) then you get the one that
> is first in document order.
> 
>> Then I tried preceding::div[@tid='field' and last()] 
> 
> last() always gives a number that is 1 or more. "and last()" converts this
> number to a boolean, and any number other than 1 is treated as true. So
> you're adding "and true()" to your predicate, which doesn't change its
> result. You were probably thinking of
> 
> preceding::div[@tid='field'][last()]
> 
> which means
> 
> preceding::div[@tid='field'][position() = last()]
> 
> But numeric predicates attached to a reverse axis step count the nodes in
> reverse document order: 1 is the nearest, and last() is the furthest. So the
> correct predicate is [1].
> 
> Regards,
> 
> Michael Kay
> http://www.saxonica.com/
> http://twitter.com/michaelhkay 
>  
> 
>> -----Original Message-----
>> From: Chris Wolf [mailto:cw10025@xxxxxxxxx] 
>> Sent: 03 October 2009 20:37
>> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
>> Subject: [xsl] trouble with preceding:: and parsing xhtml
>>
>> I have some xhtml documents that I want to process with XSL.  
>> The patterns that I'm interested in have a series of 
>> occurances of "div" element in pairs as in:
>>
>> <xhtml...>
>> <head/>
>> <body..>
>> <table...>
>> <tr..>
>> <td...>
>> <div tid="field"><a href="...">Foo</a></div> <table...> 
>> <tr...> <td...> <div class="category_data">Bla,Bla,Bla</div>
>> <...>
>>
>> this pattern of the two pairs of div variations repeats an 
>> arbitrary number of times throughout the document and there 
>> could be other "div" elements interspersed, but not with the 
>> same qualifying attributes.
>>
>>
>> Note that the "div" with "class='category_data'" is not a 
>> descendant of the first "div[@tid='field']"
>> I don't think these pairs of DIVs are siblings either (at the 
>> same level).
>>
>> Basically, I'm trying to generate XML of name-value pairs 
>> where the name
>> comes from the content of the <a/> in the first 
>> "div[@tid='field']", and the value is the
>> content of the second "div[@class='category_data']".
>>
>> So the output should be:
>> <Field name="Foo">Bla,Bla,Bla</Field>
>>
>> Where the value of the "name" attribute is the content of the 
>> input doc's
>> div[@tid='field']/a, i.e. in this example, 'Foo'
>>
>> ...and the content of "Field" is the content of the input doc's
>> div[@class='category_data']
>>
>>
>>
>> Since the the second div is not a descendant of the first, I 
>> can't capture 
>> the <a/> content in a variable and call <xsl:apply-templates 
>> select="div[@class='category_data']"/>
>> with a parameter.
>>
>> The question is how else to pass data from one template to 
>> another template?
>>
>> I tried "reaching back" from the second template by using 
>> preceding::div[@tid='field']
>> but this retrieved the value of the first node matching 
>> "div[@tid='field']" not
>> the immediately preceding node that matches, as I would have 
>> expected.  Then I tried
>> preceding::div[@tid='field' and last()] - same result; always 
>> the same value and
>> always the value of the very first node that matched.
>>
>> I guess I have no idea how "preceding::" is supposed to work.
>>
>>
>> I would greatly appreciate any help.  
>>
>> Thanks,
>>
>>    -Chris
>>
>> <xsl:stylesheet version="1.0"
>>     xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>>     xmlns:h="http://www.w3.org/1999/xhtml";>
>>
>> <xsl:output method="xml" indent="yes"/>
>> <xsl:strip-space elements="div a"/>
>>
>> <xsl:template match="/">
>>   <xsl:message>***** ROOT</xsl:message>
>>     <xsl:apply-templates select="//h:div"/>
>> </xsl:template>
>>
>> <xsl:template match="h:div[@tid='field']">
>>   <xsl:message>***** DIV1</xsl:message>
>>   <xsl:apply-templates select="h:div"/>
>> </xsl:template>
>>
>> <xsl:template match="h:div[@class='category_data']">
>>   <xsl:param name="fname"/>
>>   <xsl:message>***** DIV2</xsl:message>
>>   <xsl:message>^<xsl:value-of 
>> select="preceding::h:div[@tid='field']"/>^</xsl:message>
>>   <xsl:element name="Field">
>>     <xsl:attribute name="name">
>>       <xsl:value-of select="preceding::h:div[@tid='field']"/>
>>     </xsl:attribute>
>>     <xsl:value-of select="."/>
>>   </xsl:element><xsl:text>
>> </xsl:text>
>>         <xsl:apply-templates/>
>> </xsl:template>
>>
>> <xsl:template match="text()">
>>   <xsl:message>***** TEXT</xsl:message>
>>     <xsl:apply-templates/>
>> </xsl:template>
>>
>> </xsl:stylesheet>

Current Thread