Re: [xsl] Form feed character () in decoded xs:base64Binary

Subject: Re: [xsl] Form feed character () in decoded xs:base64Binary
From: "Martynas Jusevičius martynas@xxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 13 Jul 2020 18:54:56 -0000
Ah, sorry :) I get it now.

After I make the change:

<?xml version="1.1" encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [
    <!ENTITY rdf    "http://www.w3.org/1999/02/22-rdf-syntax-ns#";>
    <!ENTITY rdfs   "http://www.w3.org/2000/01/rdf-schema#";>
    <!ENTITY xsd    "http://www.w3.org/2001/XMLSchema#";>
    <!ENTITY dct    "http://purl.org/dc/terms/";>
    <!ENTITY skos   "http://www.w3.org/2004/02/skos/core#";>
]>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
version="3.0"
...

I start getting this error:

Error on line 248 column 46 of messages2trix.xsl:
  SXXP0003  Error reported by XML parser: The entity "xsd" was
referenced, but not declared.
org.xml.sax.SAXParseException; systemId: file:/.../messages2trix.xsl;
lineNumber: 248; columnNumber: 46; The entity "xsd" was referenced,
but not declared.

Line 248 contains "&xsd;dateTime".

On Mon, Jul 13, 2020 at 8:44 PM Imsieke, Gerrit, le-tex
gerrit.imsieke@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
wrote:
>
> I was suggesting that you prepend <?xml version="1.1"?> to your
> stylesheet document, hoping that you are then able to apply translate(.,
> '&#xc;', '') to the decoded string.
>
> On 13.07.2020 20:26, Martynas JuseviD
ius martynas@xxxxxxxxxxxxx wrote:
> > With xsl:output version="1.1", the form feed is not a problem - Saxon
> > writes the decoded xs:base64Binary string without any replacements.
> >
> > However I'm getting weird parsing errors downstream in my RDF toolkit
> > (which works fine with XML 1.0). I'll try to see what the problem is.
> >
> > On Mon, Jul 13, 2020 at 8:00 PM Imsieke, Gerrit, le-tex
> > gerrit.imsieke@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
> > wrote:
> >>
> >> What happens if you use version="1.1" in the XML declaration (of the
> >> stylesheet)?
> >>
> >> On 13.07.2020 19:54, Martynas JuseviD
ius martynas@xxxxxxxxxxxxx wrote:
> >>> Hi,
> >>>
> >>> I'm transforming large JSON files with some email data using XSLT 3.0.
> >>> They contain xs:base64Binary literals which I'm decoding using
> >>> bin:decode-string() and want to include the decoded values in the
> >>> output XML.
> >>>
> >>> The problem is that some of the decoded string values have illegal XML
> >>> 1.0 characters in them, such as Form feed (&#xc;).
> >>>
> >>> I want to remove them but cannot find a way.
> >>> I can't use translate(., '&#xc;', '') because the stylesheet would not
> >>> be well-formed anymore.
> >>> I can't even use replace(., codepoints-to-string(12), '') because I
> >>> get this error (with Saxon 10.1 EE):
> >>>
> >>>       codepoints-to-string(): invalid XML character [xc]. Found while
> >>> atomizing the second argument of fn:replace()
> >>>
> >>> Are there any native XSLT options here?
> >>>
> >>> Thanks.

Current Thread