Subject: Re: [xsl] Saxon and ZWNJ From: Mohsen Saboorian <mohsens@xxxxxxxxx> Date: Mon, 10 Jun 2013 13:03:29 +0430 |
Sorry, this was related to my underlying HTML cleaner engine (which provides HTML => valid DOM 3). I upgraded from htmlcleaner-2.2 to htmlcleaner-2.5 and this escaping issue happened. I just downgraded and this was resolved. Thanks, Mohsen On Mon, Jun 10, 2013 at 11:28 AM, Michael Kay <mike@xxxxxxxxxxxx> wrote: > Yes, I think it's a bug -- but not in Saxon. > > Saxon's implementation of XdmItem.getStringValue() relies on calling textNode.getNodeValue() in the underlying DOM, and my suspicion is that this method is returning the value of the text node in escaped form. > > What exactly is this "HTML cleaned DOM" that you are passing to the DOMSource constructor? If my suspicion is correct, it doesn't implement the DOM spec correctly. > > Michael Kay > Saxonica > > PS: this question is very product specific. Product-specific questions are better addressed to a product-specific forum rather than to the xsl-list. For Saxon, you can use the forums at saxonica.plan.io > > > > > On 9 Jun 2013, at 22:42, Mohsen Saboorian wrote: > >> Hi, >> I'm trying to evaluate an XPATH expression with saxon-9.1.0.8 using >> the following code snippet: >> >> Configuration conf = new Configuration(); >> conf.setValidation(false); >> Processor p = new Processor(false); >> DocumentBuilder documentBuilder = p.newDocumentBuilder(); >> XPathCompiler xpathCompiler = p.newXPathCompiler(); >> >> XPathExecutable xpe = xpathCompiler.compile(expression); >> XPathSelector xpath = xpe.load(); >> xpath.setContextItem(documentBuilder.build(new >> DOMSource(cleanHtml.document))); >> >> XdmItem result = xpath.evaluateSingle(); >> >> The HTML is in Persian script (whose cleaned DOM is passed as >> cleanHtml.document in the above code) which has ZWNJ (U+200C) not >> escaped. >> >> The matched XdmItem has ZWNJ (U+200C) (non-escaped) but when obtaining >> result.getStringValue(), the result has escaped ZWNJ as (‌) which >> doesn't seem to be correct because I'm getting node 'string' value. >> >> Is this a bug, or is there any flag to disable escaping special >> Unicode characters in saxon? >> >> Regards, >> Mohsen
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Saxon and ZWNJ, Michael Kay | Thread | Re: [xsl] Cannot write more than on, Christian Roth |
Re: [xsl] Saxon and ZWNJ, Michael Kay | Date | Re: [xsl] Cannot write more than on, Christian Roth |
Month |