Re: [xsl] > replaced by ">", < is not replaced...

Subject: Re: [xsl] > replaced by ">", < is not replaced...
From: Jethro Borsje <jethro@xxxxxxxxxxxx>
Date: Fri, 13 Jul 2007 12:12:00 +0200
Hi there,

This is the Java code that is used for the transformation:
[code]
   private String convertSelectionToHTML(String p_selection)
   {
      // setup error logging
      XmlOptions validateOptions = new XmlOptions();
      ArrayList<XmlError> errorList = new ArrayList<XmlError>();
      validateOptions.setErrorListener(errorList);

      try
      {
         Transformer transfomer = getTransformer();

         logger.debug("------------------------------------");
         logger.debug("Parsing body[" + p_selection + "]");
         XmlObject bodyObject = XmlObject.Factory.parse(p_selection,
validateOptions);

         // transform body
         DOMResult bodyTransformResult = new DOMResult();
         DOMSource bodyTransformSource = new
DOMSource(bodyObject.getDomNode());
         transfomer.transform(bodyTransformSource, bodyTransformResult);
         bodyObject =
XmlObject.Factory.parse(bodyTransformResult.getNode());

         logger.debug("after transformation: " + bodyObject.toString());
         logger.debug("------------------------------------");

         return bodyObject.xmlText();
      }
      catch (XmlException e)
      {
         logger.error("Unable to parse body: " + p_selection, e);
         if (!errorList.isEmpty())
         {
            for (XmlError error : errorList)
            {
               logger.error("\t-" + error.getMessage() +
"\n\t\tLocation of invalid XML: "
                     + error.getCursorLocation().xmlText() + "\n");
            }
         }
      }
      catch (TransformerException e)
      {
         logger.error("Unable to parse body: " + p_selection, e);
         if (!errorList.isEmpty())
         {
            for (XmlError error : errorList)
            {
               logger.error("\t-" + error.getMessage() +
"\n\t\tLocation of invalid XML: "
                     + error.getCursorLocation().xmlText() + "\n");
            }
         }
      }
      return null;
   }

   private Transformer getTransformer()
   {
      Transformer result = null;
      TransformerFactory transformerFactory =
TransformerFactory.newInstance();
      try
      {
         result = transformerFactory.newTransformer(new
StreamSource(this.getClass().getClassLoader()
               .getResourceAsStream("selection-view.xsl")));
      }
      catch (TransformerConfigurationException e)
      {
         logger.error("Error creating transformer", e);
      }
      return result;
   }
[/code]

Actually I want all the HTML things to be preserved, other things which are removed by the transition are things like: &#160;, which I all want to keep.

--
Best regards,
Jethro Borsje

http://www.jborsje.nl

Michael Kay wrote:
Actually, &lt; and &gt; were replaced by "<" and ">" respectively while
parsing; the difference is that during serialization, "<" has been converted
back to "&lt;", but ">" has not been converted back to "&gt;". This caused
me a little confusion in reading your message!

What XSLT processor did you use and how did you run it? Are you sure the
serialization was done by an XSLT processor? I'm puzzled because there's no
evidence that it used the HTML output method, which it should have done.
When serializing as XML, there is no need to write ">" as "&gt;", but in
HTML, the HTML spec advises that this "should" be done. The XSLT 2.0
serialization specification, surprisingly, seems to have nothing to say on
the subject.

Michael Kay
http://www.saxonica.com/


-----Original Message-----
From: Jethro Borsje [mailto:jethro@xxxxxxxxxxxx] Sent: 13 July 2007 10:07
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: [xsl] &gt; replaced by ">", &lt; is not replaced...


Hi everybody,

I am trying to transform a HTML page using XSL, the problem is that somehow my "&gt;" signs in the input text are changed to ">" while "&lt;" are not changed. This XSL I am using:
[stylesheet]
<?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>


<xsl:template match="/">
<html>
<head>
<style>
body
{

font-family:'Courier New', Courier, monospace;
font-size:11px; color:#333333;
font-weight:normal;
line-height: 140%;
text-align:justify;


					}
					span.rule
					{
						font-weight:bold;
					}
					span.issuer, span.target
					{
						font-weight:bold;
						display:inline;
					}
				</style>
			</head>
			<body>
				<xsl:apply-templates />
			</body>
		</html>
	</xsl:template>

	<xsl:template match="br">
		<xsl:element name="br"></xsl:element>
	</xsl:template>

<!-- Copy all <span> tags together with the attributes. -->
<xsl:template match="span">
<xsl:element name="span">
<xsl:attribute name="id"><xsl:value-of select="@id" /></xsl:attribute>

<xsl:if test="@style">
<xsl:attribute name="style"><xsl:value-of select="@style" /></xsl:attribute>
</xsl:if>

<xsl:if test="@class">
<xsl:attribute name="class"><xsl:value-of select="@class" /></xsl:attribute>
</xsl:if>

<xsl:value-of select="." />
</xsl:element>
</xsl:template>


</xsl:stylesheet>
[/stylesheet]

This is the text that is being parsed:
[parsed text]
<html>
<body>
<span class="target" id="http://www.owl-ontologies.com/Ontology1182253177.owl#WHITB
READ">&lt;WTB.L&gt;</span>
said on Monday it was considering the sal
</body>
</html>
[/parsed text]


This is the text after transformation:
[transformed text]
<html>
<head>
<style>
body
{
font-family:'Courier New', Courier, monospace;
font-size:11px; color:#333333;
font-weight:normal;
line-height: 140%;
text-align:justify;
}
span.rule
{
font-weight:bold;
}
span.issuer, span.target
{
font-weight:bold;
display:inline;
}
</style>
</head>
<body>
<span class="target" id="http://www.owl-ontologies.com/Ontology1182253177.owl#WHITB
READ">&lt;WTB.L></span>
said on Monday it was considering the sal </body> </html> [/transformed text]


As you can see the "&gt;" is replaced by ">", however the "&lgt;" is NOT replaced by "<". I do not understand how this is possible. The desired result is that they both do NOT get replaced, so both "&gt;" and "&lt;" should appear in the transformed text.

--
Best regards,
Jethro Borsje

http://www.jborsje.nl

Current Thread