Re: [xsl] Is it possible to use replace with an variable for entities?

Subject: Re: [xsl] Is it possible to use replace with an variable for entities?
From: "Wendell Piez wapiez@xxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 7 Jul 2022 11:13:51 -0000
Hi,

The serialization option for US-ASCII works tolerably well for this.

<xsl:output encoding="us-ascii"/>

As Mike describes, it essentially forces all characters not in US ASCII to
be represented as numeric character references.

Cheers, Wendell



On Thu, Jul 7, 2022 at 3:12 AM Michael Kay mike@xxxxxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> There are two stages to this: (a) replacing \uHHHH with the unicode
> character that it represents, and (b) replacing this unicode character with
> an XML entity reference. Logically, the first step is a transformation,
> while the second step is part of serialization (since entity references
> exist only in serialized XML, and not in the XDM tree representation).
>
> The "cheap and dirty" way is probably to use disable-output-escaping:
>
> <xsl:value-of select="replace($in, '\\u(\d\d\d\d)', '&amp;#x$1;')"
> disable-output-escaping="yes"/>
>
> (Note that this won't work for surrogate pairs, since \uXXXX can represent
> half of a surrogate pair, and `&#xXXXX;` can't).
>
> If you want to do the two stages separately, then
>
> (a) Saxon offers the function saxon:replace-with() which allows you to
> apply a user-supplied function to the matched substring - see
>
https://www.saxonica.com/documentation11/index.html#!functions/saxon/replace-
with
>
> (b) You can force characters to be serialized using entity references
> (technically, character references) by using an encoding (such as
> iso-8859-1) in which the characters cannot be represented any other way.
> Saxon also has an xsl:output option (saxon:character-representation) to
> force all non-ASCII characters to be represented as character references.
> Or if you want to be more specific, you can use a character map.
>
> Michael Kay
> Saxonica
>
> On 7 Jul 2022, at 06:36, Torsten SchaCan schassan@xxxxxx <
> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> o;?Dear colleagues,
>
> I need to replace Unicode references (encoded in RTF) with entities via
> XSLT.
>
> My replace command would look like these for example:
>
> replace($value, '\\u7936', 'a<')
> replace($value, '\\u183 \\\^b7', 'B7')
>
> Now I want to avoid to have x-times (nested?) replaces for each character,
> but would like to use a variable like this:
>
> replace($value, '\\u(\d{4})', '&#$1;')
> replace($value, '\\u(\d{3}) \\\^[0-9a-z]{2}', '&#$1;')
>
> This, unfortunately, throws an error, as '&#$1;' is no valid entity
> declaration.
>
> Additionally, my parser doesn't allow to use map:keys($rtfEncodingMap).
>
> Is there a workaround or a solution I might have missed?
>
>
>
> Best,
> Torsten
> --
> Torsten Schassan - Abteilung Handschriften und Sondersammlungen / Digitale
> Editionen
> Herzog August Bibliothek, D-38299 Wolfenbuettel, Tel.: +49 5331 808-130
> Fax -165
> Handschriftendatenbank: https://diglib.hab.de/?db=mss
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/293509> (by
> email)
>
>
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/174322> (by
> email <>)
>


--
...Wendell Piez... ...wendell -at- nist -dot- gov...
...wendellpiez.com... ...pellucidliterature.org... ...pausepress.org...
...github.com/wendellpiez... ...gitlab.coko.foundation/wendell...

Current Thread