Re: [xsl] where to look for xsl folk..

Subject: Re: [xsl] where to look for xsl folk..
From: "Graydon graydon@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 3 Jul 2016 21:02:00 -0000
On Sun, Jul 03, 2016 at 04:13:09PM -0000, Terry Badger
terry_badger@xxxxxxxxx scripsit:
> Graydon, The document.xml I have found and worked with taken from a
> .docx file always have a prolog that has encoding="UTF-8" so I have
> not worried about invalid Unicode characters and can process any text
> in Word using an xsl stylesheet.  Do you have a sample where a docx
> file has non Unicode encodings?

Not on hand, and if I did, it wouldn't be my data to share.

I've hit two cases of code point 96 -- a codepage 1252 n-dash -- in an
XSLT document (which is admittedly not Word) during paid work in the
last couple weeks, though.  It does happen.  It won't cause problems
until something checks for UTF-8 encoding specifically, rather than the
XML character set.  It's entirely possible to have the whole XSLT
toolchain completely happy -- as it was in that case -- and something
downstream -- checking for encoding -- not happy at all.  I have
certainly hit this problem with the XML versions of Office documents in
the past.

Pre-XML ver 5, it was possible to trust the parser to tell if your
document wasn't UTF-8 because XML's character set was a subset of UTF-8.
With ver 5, that's no longer the case.

-- Graydon

Current Thread