RE: [xsl] > replaced by ">", < is not replaced...

Subject: RE: [xsl] > replaced by ">", < is not replaced...
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Fri, 13 Jul 2007 11:43:39 +0100
You're sending the transformation output to a DOMResult, so the
serialization is presumably being done by the DOM implementation, not by an
XSLT processor. So it's being serialized as XML, in which ">" is perfectly
valid in a text node. If you want HTML output, use a StreamResult so that
the XSLT serializer is invoked.

You haven't actually answered the question about which XSLT processor you
are using. IIRC XmlObject is part of XmlBeans and XmlBeans uses Saxon, at
least in some configurations... It does help to know what processor you are
using. You can do system-property('xsl:vendor') to find out.

Michael Kay
http://www.saxonica.com/ 

> -----Original Message-----
> From: Jethro Borsje [mailto:jethro@xxxxxxxxxxxx] 
> Sent: 13 July 2007 11:11
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: [xsl] &gt; replaced by ">", &lt; is not replaced...
> 
> Hi there,
> 
> This is the Java code that is used for the transformation:
> [code]
>     private String convertSelectionToHTML(String p_selection)
>     {
>        // setup error logging
>        XmlOptions validateOptions = new XmlOptions();
>        ArrayList<XmlError> errorList = new ArrayList<XmlError>();
>        validateOptions.setErrorListener(errorList);
> 
>        try
>        {
>           Transformer transfomer = getTransformer();
> 
>           logger.debug("------------------------------------");
>           logger.debug("Parsing body[" + p_selection + "]");
>           XmlObject bodyObject = XmlObject.Factory.parse(p_selection,
> validateOptions);
> 
>           // transform body
>           DOMResult bodyTransformResult = new DOMResult();
>           DOMSource bodyTransformSource = new 
> DOMSource(bodyObject.getDomNode());
>           transfomer.transform(bodyTransformSource, 
> bodyTransformResult);
>           bodyObject =
> XmlObject.Factory.parse(bodyTransformResult.getNode());
> 
>           logger.debug("after transformation: " + 
> bodyObject.toString());
>           logger.debug("------------------------------------");
> 
>           return bodyObject.xmlText();
>        }
>        catch (XmlException e)
>        {
>           logger.error("Unable to parse body: " + p_selection, e);
>           if (!errorList.isEmpty())
>           {
>              for (XmlError error : errorList)
>              {
>                 logger.error("\t-" + error.getMessage() + 
> "\n\t\tLocation of invalid XML: "
>                       + error.getCursorLocation().xmlText() + "\n");
>              }
>           }
>        }
>        catch (TransformerException e)
>        {
>           logger.error("Unable to parse body: " + p_selection, e);
>           if (!errorList.isEmpty())
>           {
>              for (XmlError error : errorList)
>              {
>                 logger.error("\t-" + error.getMessage() + 
> "\n\t\tLocation of invalid XML: "
>                       + error.getCursorLocation().xmlText() + "\n");
>              }
>           }
>        }
>        return null;
>     }
> 
>     private Transformer getTransformer()
>     {
>        Transformer result = null;
>        TransformerFactory transformerFactory = 
> TransformerFactory.newInstance();
>        try
>        {
>           result = transformerFactory.newTransformer(new
> StreamSource(this.getClass().getClassLoader()
>                 .getResourceAsStream("selection-view.xsl")));
>        }
>        catch (TransformerConfigurationException e)
>        {
>           logger.error("Error creating transformer", e);
>        }
>        return result;
>     }
> [/code]
> 
> Michael Kay wrote:
> > Actually, &lt; and &gt; were replaced by "<" and ">" respectively 
> > while parsing; the difference is that during serialization, "<" has 
> > been converted back to "&lt;", but ">" has not been 
> converted back to 
> > "&gt;". This caused me a little confusion in reading your message!
> > 
> > What XSLT processor did you use and how did you run it? Are 
> you sure 
> > the serialization was done by an XSLT processor? I'm 
> puzzled because 
> > there's no evidence that it used the HTML output method, 
> which it should have done.
> > When serializing as XML, there is no need to write ">" as 
> "&gt;", but 
> > in HTML, the HTML spec advises that this "should" be done. The XSLT 
> > 2.0 serialization specification, surprisingly, seems to 
> have nothing 
> > to say on the subject.
> > 
> > Michael Kay
> > http://www.saxonica.com/
> > 
> >> -----Original Message-----
> >> From: Jethro Borsje [mailto:jethro@xxxxxxxxxxxx]
> >> Sent: 13 July 2007 10:07
> >> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> >> Subject: [xsl] &gt; replaced by ">", &lt; is not replaced...
> >>
> >> Hi everybody,
> >>
> >> I am trying to transform a HTML page using XSL, the 
> problem is that 
> >> somehow my "&gt;" signs in the input text are changed to ">" while 
> >> "&lt;" are not changed. This XSL I am using:
> >> [stylesheet]
> >> <?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet 
> >> version="2.0"
> >> 	xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
> >>
> >> 	<xsl:template match="/">
> >> 		<html>
> >> 			<head>
> >> 				<style>
> >> 					body
> >> 					{
> >> 						
> >> font-family:'Courier New', Courier, monospace;
> >> 						font-size:11px;
> >> color:#333333;
> >> 						font-weight:normal;
> >> 						line-height: 140%;
> >> 						text-align:justify;
> >>
> >> 					}
> >> 					span.rule
> >> 					{
> >> 						font-weight:bold;
> >> 					}
> >> 					span.issuer, span.target
> >> 					{
> >> 						font-weight:bold;
> >> 						display:inline;
> >> 					}
> >> 				</style>
> >> 			</head>
> >> 			<body>
> >> 				<xsl:apply-templates />
> >> 			</body>
> >> 		</html>
> >> 	</xsl:template>
> >>
> >> 	<xsl:template match="br">
> >> 		<xsl:element name="br"></xsl:element>
> >> 	</xsl:template>
> >>
> >> 	<!-- Copy all <span> tags together with the attributes. -->
> >> 	<xsl:template match="span">
> >> 		<xsl:element name="span">
> >> 			<xsl:attribute name="id"><xsl:value-of 
> select="@id" 
> >> /></xsl:attribute>
> >> 			
> >> 			<xsl:if test="@style">
> >> 				<xsl:attribute
> >> name="style"><xsl:value-of select="@style" 
> >> /></xsl:attribute>
> >> 			</xsl:if>
> >> 			
> >> 			<xsl:if test="@class">
> >> 				<xsl:attribute
> >> name="class"><xsl:value-of select="@class" 
> >> /></xsl:attribute>
> >> 			</xsl:if>
> >> 		
> >> 			<xsl:value-of select="." />
> >> 		</xsl:element>
> >> 	</xsl:template>
> >>
> >> </xsl:stylesheet>
> >> [/stylesheet]
> >>
> >> This is the text that is being parsed:
> >> [parsed text]
> >> <html>
> >> 	<body>
> >> 		<span class="target" 
> >> id="http://www.owl-ontologies.com/Ontology1182253177.owl#WHITB
> >> READ">&lt;WTB.L&gt;</span>
> >> said on Monday it was considering the sal
> >> 	</body>
> >> </html>
> >> [/parsed text]
> >>
> >> This is the text after transformation:
> >> [transformed text]
> >> <html>
> >> 	<head>
> >> 	<style>
> >> 		body
> >> 		{
> >> 			font-family:'Courier New', Courier, monospace;
> >> 			font-size:11px; color:#333333;
> >> 			font-weight:normal;
> >> 			line-height: 140%;
> >> 			text-align:justify;
> >> 		}
> >> 		span.rule
> >> 		{
> >> 			font-weight:bold;
> >> 		}
> >> 		span.issuer, span.target
> >> 		{
> >> 			font-weight:bold;
> >> 			display:inline;
> >> 		}
> >> 	</style>
> >> 	</head>
> >> <body>
> >> 	<span class="target" 
> >> id="http://www.owl-ontologies.com/Ontology1182253177.owl#WHITB
> >> READ">&lt;WTB.L></span>
> >> said on Monday it was considering the sal </body> </html> 
> >> [/transformed text]
> >>
> >> As you can see the "&gt;" is replaced by ">", however the 
> "&lgt;" is 
> >> NOT replaced by "<". I do not understand how this is possible. The 
> >> desired result is that they both do NOT get replaced, so 
> both "&gt;" 
> >> and "&lt;"
> >> should appear in the transformed text.
> >>
> >> --
> >> Best regards,
> >> Jethro Borsje
> >>
> >> http://www.jborsje.nl

Current Thread