Subject: Re: [xsl] Is it possible to use replace with an variable for entities? From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Thu, 7 Jul 2022 07:12:20 -0000 |
There are two stages to this: (a) replacing \uHHHH with the unicode character that it represents, and (b) replacing this unicode character with an XML entity reference. Logically, the first step is a transformation, while the second step is part of serialization (since entity references exist only in serialized XML, and not in the XDM tree representation). The "cheap and dirty" way is probably to use disable-output-escaping: <xsl:value-of select="replace($in, '\\u(\d\d\d\d)', '&#x$1;')" disable-output-escaping="yes"/> (Note that this won't work for surrogate pairs, since \uXXXX can represent half of a surrogate pair, and `&#xXXXX;` can't). If you want to do the two stages separately, then (a) Saxon offers the function saxon:replace-with() which allows you to apply a user-supplied function to the matched substring - see https://www.saxonica.com/documentation11/index.html#!functions/saxon/replace- with (b) You can force characters to be serialized using entity references (technically, character references) by using an encoding (such as iso-8859-1) in which the characters cannot be represented any other way. Saxon also has an xsl:output option (saxon:character-representation) to force all non-ASCII characters to be represented as character references. Or if you want to be more specific, you can use a character map. Michael Kay Saxonica > On 7 Jul 2022, at 06:36, Torsten SchaCan schassan@xxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > o;?Dear colleagues, > > I need to replace Unicode references (encoded in RTF) with entities via XSLT. > > My replace command would look like these for example: > > replace($value, '\\u7936', 'a<') > replace($value, '\\u183 \\\^b7', 'B7') > > Now I want to avoid to have x-times (nested?) replaces for each character, but would like to use a variable like this: > > replace($value, '\\u(\d{4})', '&#$1;') > replace($value, '\\u(\d{3}) \\\^[0-9a-z]{2}', '&#$1;') > > This, unfortunately, throws an error, as '&#$1;' is no valid entity declaration. > > Additionally, my parser doesn't allow to use map:keys($rtfEncodingMap). > > Is there a workaround or a solution I might have missed? > > > > Best, > Torsten > -- > Torsten Schassan - Abteilung Handschriften und Sondersammlungen / Digitale Editionen > Herzog August Bibliothek, D-38299 Wolfenbuettel, Tel.: +49 5331 808-130 Fax -165 > Handschriftendatenbank: https://diglib.hab.de/?db=mss <https://diglib.hab.de/?db=mss> > XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list> > EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/293509> (by email <>)
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Is it possible to use rep, David Maus lists@xxx | Thread | Re: [xsl] Is it possible to use rep, Wendell Piez wapiez@ |
Re: [xsl] Is it possible to use rep, David Maus lists@xxx | Date | Re: [xsl] Is it possible to use rep, Wendell Piez wapiez@ |
Month |