Subject: Re: [xsl] Diacritics in original document From: "Imsieke, Gerrit, le-tex gerrit.imsieke@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Sun, 30 Aug 2015 07:10:16 -0000 |
Mark, the C< is decomposed into a 'u' and into U+308 (combining diaeresis). You can normalize it to C< using normalize-unicode() [1], as in normalize-unicode('uL', 'NFKC') You can check the result in oXygenbs XPath input field: string-to-codepoints(normalize-unicode('uL', 'NFKC')) b 252, which is U+FC, or 'C<'. Gerrit [1] http://www.w3.org/TR/xpath-functions/#func-normalize-unicode On 30.08.2015 08:52, Mark Wilson pubs@xxxxxxxxxxxx wrote: > > I am working with an original xml file that says <?xml version="1.0" > encoding="UTF-8"?>. > However, elements whose values contain diacritics appear to be something > else (see fC<r in the two examples below): > XML rendition in Oxygen: > Mittheilungen der tauschvereinigung fuCKr postwerthzeichen zu Elberfeld. > > Which, here in my email, using Western (Windows-1252) are rendered > correctly as: > Mittheilungen der tauschvereinigung fuLr postwerthzeichen zu Elberfeld. > > The text output from my transformations have the same problem. > > Do I need to change the encoding in my stylesheets? If so, how? Or is > there a solution? > Thanks, > Mark > > > > -- Gerrit Imsieke GeschC$ftsfC<hrer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de Registergericht / Commercial Register: Amtsgericht Leipzig Registernummer / Registration Number: HRB 24930 GeschC$ftsfC<hrer: Gerrit Imsieke, Svea Jelonek, Thomas Schmidt, Dr. Reinhard VC6ckler
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] Diacritics in original docume, Mark Wilson pubs@xxx | Thread | Re: [xsl] using -it in command line, Michael Kay mike@xxx |
[xsl] Diacritics in original docume, Mark Wilson pubs@xxx | Date | Re: [xsl] using -it in command line, Michael Kay mike@xxx |
Month |