Subject: Re: [xsl] Is it possible to use replace with an variable for entities? From: "Wendell Piez wapiez@xxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Thu, 7 Jul 2022 11:13:51 -0000 |
Hi, The serialization option for US-ASCII works tolerably well for this. <xsl:output encoding="us-ascii"/> As Mike describes, it essentially forces all characters not in US ASCII to be represented as numeric character references. Cheers, Wendell On Thu, Jul 7, 2022 at 3:12 AM Michael Kay mike@xxxxxxxxxxxx < xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > There are two stages to this: (a) replacing \uHHHH with the unicode > character that it represents, and (b) replacing this unicode character with > an XML entity reference. Logically, the first step is a transformation, > while the second step is part of serialization (since entity references > exist only in serialized XML, and not in the XDM tree representation). > > The "cheap and dirty" way is probably to use disable-output-escaping: > > <xsl:value-of select="replace($in, '\\u(\d\d\d\d)', '&#x$1;')" > disable-output-escaping="yes"/> > > (Note that this won't work for surrogate pairs, since \uXXXX can represent > half of a surrogate pair, and `&#xXXXX;` can't). > > If you want to do the two stages separately, then > > (a) Saxon offers the function saxon:replace-with() which allows you to > apply a user-supplied function to the matched substring - see > https://www.saxonica.com/documentation11/index.html#!functions/saxon/replace- with > > (b) You can force characters to be serialized using entity references > (technically, character references) by using an encoding (such as > iso-8859-1) in which the characters cannot be represented any other way. > Saxon also has an xsl:output option (saxon:character-representation) to > force all non-ASCII characters to be represented as character references. > Or if you want to be more specific, you can use a character map. > > Michael Kay > Saxonica > > On 7 Jul 2022, at 06:36, Torsten SchaCan schassan@xxxxxx < > xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > o;?Dear colleagues, > > I need to replace Unicode references (encoded in RTF) with entities via > XSLT. > > My replace command would look like these for example: > > replace($value, '\\u7936', 'a<') > replace($value, '\\u183 \\\^b7', 'B7') > > Now I want to avoid to have x-times (nested?) replaces for each character, > but would like to use a variable like this: > > replace($value, '\\u(\d{4})', '&#$1;') > replace($value, '\\u(\d{3}) \\\^[0-9a-z]{2}', '&#$1;') > > This, unfortunately, throws an error, as '&#$1;' is no valid entity > declaration. > > Additionally, my parser doesn't allow to use map:keys($rtfEncodingMap). > > Is there a workaround or a solution I might have missed? > > > > Best, > Torsten > -- > Torsten Schassan - Abteilung Handschriften und Sondersammlungen / Digitale > Editionen > Herzog August Bibliothek, D-38299 Wolfenbuettel, Tel.: +49 5331 808-130 > Fax -165 > Handschriftendatenbank: https://diglib.hab.de/?db=mss > XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list> > EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/293509> (by > email) > > > XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list> > EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/174322> (by > email <>) > -- ...Wendell Piez... ...wendell -at- nist -dot- gov... ...wendellpiez.com... ...pellucidliterature.org... ...pausepress.org... ...github.com/wendellpiez... ...gitlab.coko.foundation/wendell...
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Is it possible to use rep, Michael Kay mike@xxx | Thread | [xsl] XPath expression to convert X, Roger L Costello cos |
Re: [xsl] Is it possible to use rep, Michael Kay mike@xxx | Date | [xsl] XPath expression to convert X, Roger L Costello cos |
Month |