Aw: Re: [xsl] How to copy attribute value to text? (Suspected bug involving supplementary characters)

Subject: Aw: Re: [xsl] How to copy attribute value to text? (Suspected bug involving supplementary characters)
From: "Martin Honnen martin.honnen@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 7 Jul 2016 20:22:13 -0000
I think you can file problems at https://saxonica.plan.i
o/projects/saxon/issues, but make sure you mention the Java version and
the way you use Saxon (command line, Api)
--
Diese Nachricht wurde von meinem Android Mobiltelefon mit GMX Mail
gesendet.Am 07.07.2016, 20:54, "Kenneth Reid Beesley krbeesley@xxxxxxxxx"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> schrieb:

  From: Kenneth Reid Beesley <krbeesley@xxxxxxxxx>
  Subject: Re: [XSL-List: The Open Forum on XSL] Digest for 2016-07-06
  Date: July 7, 2016 at 12:43:54 PM EDT
  To: "XSL-List: The Open Forum on XSL" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>

  Many thanks to Martin Honnen for his response below.  I add more
  comments below (suspected bug in Saxon).

    On 7Jul2016, at 05:28, XSL-List: The Open Forum on XSL <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
    wrote:
    From: Martin Honnen <martin.honnen@xxxxxx>
    Subject: Re: [xsl] How to copy attribute value to text?
    Date: 7 July 2016 at 00:43:37 MDT
    To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx

    On 07.07.2016 07:22, Kenneth Reid Beesley krbeesley@xxxxxxxxx
    wrote:

      If I start with an input XML document that contains mixed
      text with <word> elements like this:

      &hellip; this is just <word
      correction=&ldquo;too&rdquo;>to</word> funny

      I&rsquo;d like to write an XSLT stylesheet that yields as
      output

      &hellip; this is just <word
      origerror=&ldquo;to&rdquo;>too</word> funny

      So in the output I effectively want (in the same <word>
      element) to

      1.  Set the value of a new attribute to the original text()
      value, and
      2.  Reset the text() value to be the value of the original
      @correction attribute

      I&rsquo;ve tried many variants of the following, so far
      without success.  I&rsquo;m using SaxonHE9-7-0-6J;
      it runs, but the results are not as expected/hoped.

      I&rsquo;ve tried matching the text() in a separate template,
      but I can&rsquo;t seem to reference the attribute values of
      the parent node (i.e., <word>) of the text() and the parent
      node&rsquo;s attributes.  E.g, the following doesn&rsquo;t
      work for me, failing somehow in the
      select=&ldquo;../@correction&rdquo;  reference.

      <xsl:template match=&ldquo;word[@correction]/text()&rdquo;>
      <xsl:value-of select=&ldquo;../@correction&rdquo;/>
      </xsl:template>

    You can use

    <xsl:template match="@* | node()">
    <xsl:copy>
    <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
    </xsl:template>

    <xsl:template match="word[@correction]/text()">
    <xsl:value-of select="../@correction"/>
    </xsl:template>

    <xsl:template match="word/@correction">
    <xsl:attribute name="origerror" select=".."/>
    </xsl:template>

  Your solution looks perfect and appears to work perfectly for
  ASCII-based XML input examples like the following
  <?xml version="1.0" encoding="UTF-8"?>
  <foo> <bar>this is just <word correction="too">to</word> funny</bar>
  </foo>
  yielding the correct/desired output
  <?xml version="1.0" encoding="UTF-8"?> <foo> <bar>this is just <word
  origerror="to">too</word> funny</bar> </foo>

  I now see that some of my own attempts also worked, on the same
  ASCII-based example.
  *****  Suspected bug involving supplementary characters *****
  But my real task involves an input XML document, in UTF-8 encoding,
  that consists of Deseret Alphabet characters, which are encoded in
  the supplementary area.  In such a case, the resulting text content
  in the <word> element, copied from an original attribute value, is
  corrupted.  I saw such corruption in my own attempts, and
  couldn&rsquo;t understand what was happening.
  Using the following input document (the Deseret Alphabet characters
  may not display correctly for you)
  <?xml version="1.0" encoding="UTF-8"?>
  <foo> <bar>pp.p p.p p>p2pp; <word
  correction="p;p">pp/p	p.</word> pp2pp.</bar>
  </foo>
  the output, using your script, is corrupted.  The text() value in the
  output is not the same as the original @correction value.  Extra
  characters (just one in this case) are inserted.  The longer the
  original attribute value, the more extra characters are inserted.
  <?xml version="1.0" encoding="UTF-8"?> <foo> <bar>pp.p
  p.p p>p2pp; <word
  origerror="pp/p	p.">p;p;p</word>
  pp2pp.</bar> </foo>
  This kind of corruption is exactly what I was seeing using my own
  scripts, leading me to bother the group.
  I suspect a bug in the XSLT engine involving supplementary
  characters.  Again, I&rsquo;m using SaxonHE9-7-0-6J.
  What&rsquo;s my next step?
  Thanks,
  Ken
  ******************************** Kenneth R. Beesley, D.Phil. PO Box
  540475 North Salt Lake UT 84054 USA

  ******************************** Kenneth R. Beesley, D.Phil. PO Box
  540475 North Salt Lake UT 84054 USA

  XSL-List info and archiveEasyUnsubscribe (by email)

XSL-List info and archiveEasyUnsubscribe (by email)

Current Thread