Re: [xsl] Is it possible to use replace with an variable for entities?

Subject: Re: [xsl] Is it possible to use replace with an variable for entities?
From: "David Maus lists@xxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 7 Jul 2022 06:55:18 -0000
Good Morning Torsten,

Long time no see!

On Thu, 07 Jul 2022 07:36:14 +0200,
=?UTF-8?B?VG9yc3RlbiBTY2hhw59hbiBzY2hhc3NhbkBoYWIuZGU=?= wrote:
>
> [1  <text/plain; utf-8 (quoted-printable)>]
> [2  <text/html; utf-8 (quoted-printable)>]
> o;?Dear colleagues,
>
> I need to replace Unicode references (encoded in RTF) with entities via
XSLT.
>
> My replace command would look like these for example:
>
> replace($value, '\\u7936', 'a<')
> replace($value, '\\u183 \\\^b7', 'B7')
>
> Now I want to avoid to have x-times (nested?) replaces for each character,
but would like to use a variable like this:
>
> replace($value, '\\u(\d{4})', '&#$1;')
> replace($value, '\\u(\d{3}) \\\^[0-9a-z]{2}', '&#$1;')
>
> This, unfortunately, throws an error, as '&#$1;' is no valid entity
declaration.

That's true. In XSLT you work on the parsed representation of the XML
document, meaning all entities are already expanded. That's where the
error message comes from: The XML parser failes to expand an entity
'&#$1;'. Working on the parsed representation also means that you
cannot easily create entities. This needs to be done when the XML is
serialized.

I would approch the task with xsl:analyze-string:

1. The @regex matches RTF Unicode references

2. xsl:non-matching-substring outputs the substring as is,

3. xsl-matching-substring casts the codepoint to xs:integer and uses
codepoints-to-string() to substitue it with the Unicode character.

If you know the entities you want to appear in the XML document you
might try using xsl:character-map.


HTH,
  -- David

>
> Additionally, my parser doesn't allow to use map:keys($rtfEncodingMap).
>
> Is there a workaround or a solution I might have missed?
>
> Best,
> Torsten
> --
> Torsten Schassan - Abteilung Handschriften und Sondersammlungen / Digitale
Editionen
> Herzog August Bibliothek, D-38299 Wolfenbuettel, Tel.: +49 5331 808-130 Fax
-165
> Handschriftendatenbank: https://diglib.hab.de/?db=mss
> XSL-List info and archive
> EasyUnsubscribe (by email)

--
David Maus M.A.

Www: http://dmaus.name
Twitter: @_dmaus

Current Thread