Re: [xsl] Diacritics in original document

Subject: Re: [xsl] Diacritics in original document
From: "Imsieke, Gerrit, le-tex gerrit.imsieke@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 30 Aug 2015 07:10:16 -0000
Mark,

the C< is decomposed into a 'u' and into U+308 (combining diaeresis). You
can normalize it to C< using normalize-unicode() [1], as in
normalize-unicode('uL', 'NFKC')

You can check the result in oXygenbs XPath input field:
string-to-codepoints(normalize-unicode('uL', 'NFKC')) b 252, which is
U+FC, or 'C<'.

Gerrit

[1] http://www.w3.org/TR/xpath-functions/#func-normalize-unicode

On 30.08.2015 08:52, Mark Wilson pubs@xxxxxxxxxxxx wrote:
> 
> I am working with an original xml file that says <?xml version="1.0"
> encoding="UTF-8"?>.
> However, elements whose values contain diacritics appear to be something
> else (see fC<r in the two examples below):
> XML rendition in Oxygen:
> Mittheilungen der tauschvereinigung fuCKr postwerthzeichen zu Elberfeld.
> 
> Which, here in my email, using Western (Windows-1252) are rendered
> correctly as:
> Mittheilungen der tauschvereinigung fuLr postwerthzeichen zu Elberfeld.
> 
> The text output from my transformations have the same problem.
> 
> Do I need to change the encoding in my stylesheets? If so, how? Or is
> there a solution?
> Thanks,
> Mark
> 
> 
> 
> 

-- 
Gerrit Imsieke
GeschC$ftsfC<hrer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

GeschC$ftsfC<hrer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard VC6ckler

Current Thread