Re: [xsl] How to copy attribute value to text? (Suspected bug involving supplementary characters)

Subject: Re: [xsl] How to copy attribute value to text? (Suspected bug involving supplementary characters)
From: "Kenneth Reid Beesley krbeesley@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 9 Jul 2016 02:19:50 -0000
> From: Kenneth Reid Beesley <krbeesley@xxxxxxxxx>
> Subject: RE: [xsl] How to copy attribute value to text? (Suspected bug
involving supplementary characters)
> Date: 7 July 2016 at 12:23:29 MDT
> To: xslt <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
>
>
>
>
> *****  Suspected bug involving supplementary characters *****
>
> But my real task involves an input XML document, in UTF-8 encoding, that
consists of Deseret Alphabet characters, which are encoded in the
supplementary area.  In such a case, the resulting text content in the <word>
element, copied from an original attribute value, is corrupted.  I saw such
corruption in my own attempts, and couldnbt understand what was happening.
>
> Using the following input document (the Deseret Alphabet characters may not
display correctly for you)
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <foo>
>   <bar>pp.p p.p p>p2pp; <word
correction="p;p-">pp/p	p.</word> pp2pp.</bar>
> </foo>
>
> the output, using your script, is corrupted.  The text() value in the output
is not the same as the original @correction value.  Extra characters (just one
in this case) are inserted.  The longer the original attribute value, the more
extra characters are inserted.
>
> <?xml version="1.0" encoding="UTF-8"?>
> <foo>
>   <bar>pp.p p.p p>p2pp; <word
origerror="pp/p	p.">p;p;p-</word> pp2pp.</bar>
> </foo>
>
> This kind of corruption is exactly what I was seeing using my own scripts,
leading me to bother the group.
>
> I suspect a bug in the XSLT engine involving supplementary characters.
Again, Ibm using SaxonHE9-7-0-6J.
>
> Whatbs my next step?
>
> Thanks,
>
> Ken
>
>
>
> From: Michael MC<ller-Hillebrand <mmh@xxxxxxxxx>
> Subject: Re: [xsl] How to copy attribute value to text? (Suspected bug
involving supplementary characters)
> Date: 7 July 2016 at 14:20:30 MDT
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
>
>
> When copying the data and stylesheet into OxygenXML and also enabling bidi
support, the XSLT processing works fine.
>
> <?xml version="1.0" encoding="UTF-8"?>
> <foo>
>   <bar>pp.p p.p p>p2pp; <word
origerror="pp/p	p.">p;p-</word> pp2pp.</bar>
> </foo>
>
> So your problems may come form some details in your setup? How are you
running the transform?
>
> BTW, interesting letters!
>
> - Michael


I _was_ running the transform with the default JDK XML parser (Java 1.8).
Ibm using SaxonHE9-7-0-6J.
This default JDK parser is reputed to be buggy.


>
>
> From: Michael Kay <mike@xxxxxxxxxxxx>
> Subject: Re: [xsl] How to copy attribute value to text? (Suspected bug
involving supplementary characters)
>
>
> More likely to be a bug in the JDK parser. Try it using Apache Xerces, which
is much more reliable than the JDK parser. I think some of the long-standing
bugs in the JDK parser have finally been fixed in Java 8, so you could also
try it with a different JDK.
>
> Michael Kay
> Saxonica


Michael Kay is right.  I changed to using the Xerces-J parser and now
everything works as expected.

By the way, I found it a little difficult to figure out how to use Saxon and
specify the xerces parser.
I had to hunt around a bit.  I finally found the following incantation (as
coded in my Makefile).


# using Saxon XSLT with the Xerces-J parser
BoMDA1869c.xml: BoMDA1869.xml BoMDA1869c.xsl
	java
-Djavax.xml.parsers.DocumentBuilderFactory=org.apache.xerces.jaxp.DocumentBui
lderFactoryImpl \
  -Djavax.xml.parsers.SAXParserFactory=org.apache.xerces.jaxp.SAXParserFactor
yImpl \
  net.sf.saxon.Transform -o:$@  $<  BoMDA1869c.xsl


I have saxon9he.jar and xercesImpl.jar on my CLASSPATH.  It all seems to work.
Am I missing anything?

Many thanks to all who responded to my question.

Ken

********************************
Kenneth R. Beesley, D.Phil.
PO Box 540475
North Salt Lake UT 84054
USA

Current Thread