RE: [xsl] Tranformation failed with Saxon for "Illegal HTML character"

Subject: RE: [xsl] Tranformation failed with Saxon for "Illegal HTML character"
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Fri, 28 Jul 2006 22:40:41 +0100
The Euro symbol is not decimal 128 in Unicode. It is decimal 128 in some
Microsoft character set whose name I have forgotten. The Unicode character 128
is not a legal HTML character.

You need to make sure that the character encoding of the XML file is correctly
declared: if you are using a particular Microsoft codepage, then you need to
say so in the XML declaration.

There was a significant controversy in W3C about the rule that invalid HTML
characters must be treated as a fatal error by XSLT processors. I argued for
leniency, but the view that prevailed was that the sooner you catch misencoded
files (or files whose encoding is misdeclared), the better it is for the user
in the long run.

Michael Kay
http://www.saxonica.com/


> -----Original Message-----
> From: Gian Luca Paloni [mailto:gianluca.paloni@xxxxxxxxxxxx]
> Sent: 28 July 2006 17:12
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Cc: desantis@xxxxxxxxxxxx
> Subject: [xsl] Tranformation failed with Saxon for "Illegal
> HTML character"
>
>
> Hi all,
>
> i use Saxon ver. 8.7.3 as engine to make xslt transformations.
>
> Herebs a sample code:
>
> public static void exampleFromStream(String sourceID, String xslID)
>           throws TransformerException,
> TransformerConfigurationException,
>                  FileNotFoundException {
>
>       // Create a transform factory instance.
>       TransformerFactory tfactory  =
> TransformerFactory.newInstance(); new Boolean(false));
>
>       InputStream        xslIS     =
>           new BufferedInputStream(new FileInputStream(xslID));
>       StreamSource       xslSource = new StreamSource(xslIS);
>
>       // The following line would be necessary if the
> stylesheet contained
>       // an xsl:include or xsl:import with a relative URL
>       // xslSource.setSystemId(xslID);
>
>       // Create a transformer for the stylesheet.
>       Transformer  transformer = tfactory.newTransformer(xslSource);
>       InputStream  xmlIS       =
>           new BufferedInputStream(new FileInputStream(sourceID));
>       StreamSource xmlSource   = new StreamSource(xmlIS);
>
>       // The following line would be necessary if the source
> document contained
>       // a call on the document() function using a relative URL
>       // xmlSource.setSystemId(sourceID);
>
>       // Transform the source XML to System.out.
>       transformer.transform(xmlSource, new StreamResult(new
> PrintWriter(new FileOutputStream("c://test.html"))));
>   }
>
> If I apply the transformation to an XML file which include
> the bb,b (euro symbol, decimal 128) I got an error message saying
that:
> ERROR AT ELEMENT CONSTRUCTOR <SPAN> ON LINE 69 OF :
>   SERE0014: ILLEGAL HTML CHARACTER: DECIMAL 128 ; SYSTEMID: ;
> LINE#: 69; COLUMN#: -1
> NET.SF.SAXON.TRANS.DYNAMICERROR: ILLEGAL HTML CHARACTER: DECIMAL 128
> 	AT
> NET.SF.SAXON.EVENT.HTMLEMITTER.WRITEESCAPE(HTMLEMITTER.JAVA:321) b&.
>
> Anyone can help me?
> Is there a way to tell the transformer just to let unchanged
> and not interpret those special chars??
> Thanks in advance to all,
>
> Bye
>
> Gian
>
>
>
> --
> La presente comunicazione potrebbe contenere informazioni
> riservate e/o protette da segreto professionale ed e'
> indirizzata esclusivamente ai destinatari della medesima qui
> indicati. Se avete ricevuto per errore la presente
> comunicazione, siete invitati a segnalarcelo, rispondendo a
> questo stesso indirizzo di e-mail, e a cancellare il presente
> messaggio dal Vostro sistema. E' strettamente proibito e
> potrebbe essere fonte di violazione di legge qualsiasi uso,
> comunicazione, copia o diffusione dei contenuti di questa
> comunicazione da parte di chi la abbia ricevuta per errore o
> in violazione degli scopi della presente.
> Il messaggio e' stato analizzato alla ricerca di virus o
> contenuti pericolosi ed e' risultato NON infetto.

Current Thread