Re: [xsl] trouble with preceding:: and parsing xhtml

Subject: Re: [xsl] trouble with preceding:: and parsing xhtml
From: Chris Wolf <cw10025@xxxxxxxxx>
Date: Sun, 04 Oct 2009 03:22:39 -0400
I solved my last issue.  When I was initially performing this transformation
with command line tools, I forgot that I was cleaning the original HTML
by not only running "HtmlCleaner", but also using "sed" to rename all attribute
named "id" because there were multiple occurances with the same value,
which of course, is particularly not allowed with the "id" attribute,
since it has special value-uniqueness enforcement by parsers.

So after I added:

		Object[] fieldNodes = result.evaluateXPath("//*[@id='field']");
		for (Object node : fieldNodes) {
			if (node instanceof TagNode) {
				//System.out.println(((TagNode)node).getName());
				((TagNode)node).removeAttribute("id");
				((TagNode)node).addAttribute("tid", "field");
			}			
		}

It worked as in the command line tools - both xalan and saxon.  Although,
as mentioned in the last email, in order to use saxon, I have to use xerces
as the parser.

   -Chris 

Chris Wolf wrote:
> Unfortunately, after I moved the application to Java (xalan, whatever is baked in
> jdk-1.5.x) it still renders *some* nodes with preceding::div[@tid='field'][1] 
> with the value of the first node, so with those, I tried flipping it by replacing 
> "[1]" with "[last()]" again, but that hack only worked for some nodes.
> 
> Other then programmtically, the stylesheet works perfectly fine with 
> "xsltproc" (MacOS/Linux) and "msxsl" on Windoze.
> 
> I also tried your Saxon-6.5.5 which works fine from the command line,
> i.e. java -jar /opt/saxon-6.5.5/saxon.jar af.xhtml fbdata.xsl
> 
> ...works.  Unfortunately, I get the same weird results when I replace
> the default "javax.xml.transform.TransformerFactory" impl with
> "com.icl.saxon.TransformerFactoryImpl".
> 
> Actually - saxon won't even read the xsl file unless I override and revert
> the parser back to the built-in jdk (xerces) parser.  Unless, I do that,
> I get:
> 
> 	at com.icl.saxon.PreparedStyleSheet.prepare(PreparedStyleSheet.java:121)
> 	at com.icl.saxon.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:120)
> 	at com.icl.saxon.TransformerFactoryImpl.newTransformer(TransformerFactoryImpl.java:72)
> 	at com.starclass.ciafb.parser.FbParser.main(FbParser.java:49)
> Caused by: java.io.EOFException: no more input
> 	at com.icl.saxon.aelfred.XmlParser.popInput(XmlParser.java:4083)
> 	at com.icl.saxon.aelfred.XmlParser.pushURL(XmlParser.java:3620)
> 	at com.icl.saxon.aelfred.XmlParser.doParse(XmlParser.java:159)
> 	at com.icl.saxon.aelfred.SAXDriver.parse(SAXDriver.java:320)
> 	at com.icl.saxon.om.Builder.build(Builder.java:265)
> 	at com.icl.saxon.PreparedStyleSheet.prepare(PreparedStyleSheet.java:111)
> 	... 3 more
> ---------
> java.io.EOFException: no more input
> 	at com.icl.saxon.aelfred.XmlParser.popInput(XmlParser.java:4083)
> 	at com.icl.saxon.aelfred.XmlParser.pushURL(XmlParser.java:3620)
> 	at com.icl.saxon.aelfred.XmlParser.doParse(XmlParser.java:159)
> 	at com.icl.saxon.aelfred.SAXDriver.parse(SAXDriver.java:320)
> 	at com.icl.saxon.om.Builder.build(Builder.java:265)
> 	at com.icl.saxon.PreparedStyleSheet.prepare(PreparedStyleSheet.java:111)
> 	at com.icl.saxon.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:120)
> 	at com.icl.saxon.TransformerFactoryImpl.newTransformer(TransformerFactoryImpl.java:72)
> 	at com.starclass.ciafb.parser.FbParser.main(FbParser.java:49)
> 
> 
> Overriding the parser to be "com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl"
> fixes this, but the resulting transformation does not look anything like what
> I get from that command line.
> 
> I'm using saxon-6.5.5 like this:
> 
> System.setProperty("javax.xml.transform.TransformerFactory", 
> 	"com.icl.saxon.TransformerFactoryImpl");
> System.setProperty("javax.xml.parsers.SAXParserFactory", 
> 	"com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl");
> 
> HtmlCleaner cleaner = new HtmlCleaner();
> TagNode result = cleaner.clean(new File(fin), "utf-8");
> Document doc = new DomSerializer(cleaner.getProperties(), true).createDOM(result);
> 
> TransformerFactory tFactory = TransformerFactory.newInstance();
> StreamSource ss = new StreamSource(xsl);
> Transformer xform = tFactory.newTransformer(ss);
> StringWriter sw = new StringWriter();
> StreamResult sr = new StreamResult(sw);
> 
> xform.transform(new DOMSource(doc), sr);
> sw.flush();
> System.out.println(sw.toString());
> 	
> 
> BTW, when I ran saxon succesfully from the command line, I fed it a document
> produced by HtmlCleaner, from the command line, via:
> java -jar /opt/jlib/htmlcleaner2_1.jar src=countrytemplate_af.html dest=af.data outcharset=utf-8
> 
> 
> 
> Thanks,
> 
>   -Chris W.
> 
> Michael Kay wrote:
>> You're nearly there: you want  
>>
>> preceding::div[@tid='field'][1]
>>
>> Without the [1], you select all of them throughout the document; and if you
>> then use something like xsl:value-of (in XSLT 1.0) then you get the one that
>> is first in document order.
>>
>>> Then I tried preceding::div[@tid='field' and last()] 
>> last() always gives a number that is 1 or more. "and last()" converts this
>> number to a boolean, and any number other than 1 is treated as true. So
>> you're adding "and true()" to your predicate, which doesn't change its
>> result. You were probably thinking of
>>
>> preceding::div[@tid='field'][last()]
>>
>> which means
>>
>> preceding::div[@tid='field'][position() = last()]
>>
>> But numeric predicates attached to a reverse axis step count the nodes in
>> reverse document order: 1 is the nearest, and last() is the furthest. So the
>> correct predicate is [1].
>>
>> Regards,
>>
>> Michael Kay
>> http://www.saxonica.com/
>> http://twitter.com/michaelhkay 
>>  
>>
>>> -----Original Message-----
>>> From: Chris Wolf [mailto:cw10025@xxxxxxxxx] 
>>> Sent: 03 October 2009 20:37
>>> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
>>> Subject: [xsl] trouble with preceding:: and parsing xhtml
>>>
>>> I have some xhtml documents that I want to process with XSL.  
>>> The patterns that I'm interested in have a series of 
>>> occurances of "div" element in pairs as in:
>>>
>>> <xhtml...>
>>> <head/>
>>> <body..>
>>> <table...>
>>> <tr..>
>>> <td...>
>>> <div tid="field"><a href="...">Foo</a></div> <table...> 
>>> <tr...> <td...> <div class="category_data">Bla,Bla,Bla</div>
>>> <...>
>>>
>>> this pattern of the two pairs of div variations repeats an 
>>> arbitrary number of times throughout the document and there 
>>> could be other "div" elements interspersed, but not with the 
>>> same qualifying attributes.
>>>
>>>
>>> Note that the "div" with "class='category_data'" is not a 
>>> descendant of the first "div[@tid='field']"
>>> I don't think these pairs of DIVs are siblings either (at the 
>>> same level).
>>>
>>> Basically, I'm trying to generate XML of name-value pairs 
>>> where the name
>>> comes from the content of the <a/> in the first 
>>> "div[@tid='field']", and the value is the
>>> content of the second "div[@class='category_data']".
>>>
>>> So the output should be:
>>> <Field name="Foo">Bla,Bla,Bla</Field>
>>>
>>> Where the value of the "name" attribute is the content of the 
>>> input doc's
>>> div[@tid='field']/a, i.e. in this example, 'Foo'
>>>
>>> ...and the content of "Field" is the content of the input doc's
>>> div[@class='category_data']
>>>
>>>
>>>
>>> Since the the second div is not a descendant of the first, I 
>>> can't capture 
>>> the <a/> content in a variable and call <xsl:apply-templates 
>>> select="div[@class='category_data']"/>
>>> with a parameter.
>>>
>>> The question is how else to pass data from one template to 
>>> another template?
>>>
>>> I tried "reaching back" from the second template by using 
>>> preceding::div[@tid='field']
>>> but this retrieved the value of the first node matching 
>>> "div[@tid='field']" not
>>> the immediately preceding node that matches, as I would have 
>>> expected.  Then I tried
>>> preceding::div[@tid='field' and last()] - same result; always 
>>> the same value and
>>> always the value of the very first node that matched.
>>>
>>> I guess I have no idea how "preceding::" is supposed to work.
>>>
>>>
>>> I would greatly appreciate any help.  
>>>
>>> Thanks,
>>>
>>>    -Chris
>>>
>>> <xsl:stylesheet version="1.0"
>>>     xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>>>     xmlns:h="http://www.w3.org/1999/xhtml";>
>>>
>>> <xsl:output method="xml" indent="yes"/>
>>> <xsl:strip-space elements="div a"/>
>>>
>>> <xsl:template match="/">
>>>   <xsl:message>***** ROOT</xsl:message>
>>>     <xsl:apply-templates select="//h:div"/>
>>> </xsl:template>
>>>
>>> <xsl:template match="h:div[@tid='field']">
>>>   <xsl:message>***** DIV1</xsl:message>
>>>   <xsl:apply-templates select="h:div"/>
>>> </xsl:template>
>>>
>>> <xsl:template match="h:div[@class='category_data']">
>>>   <xsl:param name="fname"/>
>>>   <xsl:message>***** DIV2</xsl:message>
>>>   <xsl:message>^<xsl:value-of 
>>> select="preceding::h:div[@tid='field']"/>^</xsl:message>
>>>   <xsl:element name="Field">
>>>     <xsl:attribute name="name">
>>>       <xsl:value-of select="preceding::h:div[@tid='field']"/>
>>>     </xsl:attribute>
>>>     <xsl:value-of select="."/>
>>>   </xsl:element><xsl:text>
>>> </xsl:text>
>>>         <xsl:apply-templates/>
>>> </xsl:template>
>>>
>>> <xsl:template match="text()">
>>>   <xsl:message>***** TEXT</xsl:message>
>>>     <xsl:apply-templates/>
>>> </xsl:template>
>>>
>>> </xsl:stylesheet>

Current Thread