RE: [xsl] XPath Question (related to Java)

Subject: RE: [xsl] XPath Question (related to Java)
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Mon, 25 Jun 2007 23:03:13 +0100
I would certainly tend to do this in XSLT unless I needed to (and had time
to) make it ultra-efficient in which case a Java solution might be faster.

I would never attempt to hand-parse XML, but there are cases where combining
several XML documents into one big document "by hand" is perfectly OK,
including a bit of manipulation like stripping off the XML declaration - so
long as you are confident the files all use the same encoding, don't use
internal DTDs, and so on.

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: Grant Slade [mailto:grant.slade@xxxxxxxxx] 
> Sent: 25 June 2007 00:33
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: [xsl] XPath Question (related to Java)
> 
> Hi Michael - thanks for the heads up.  Maybe I can ask you 
> and the group a more general question.  What I was trying to 
> do was go through a file of dictionary terms, read in the 
> terms one at a time and then add them to a 3rd party native 
> xml database application that takes a well-formed xml 
> document (but in String format, thus my trying to get the 
> information from it in String format).  I have been trying to 
> be a good student of XML and learn the APIs, but I am 
> wondering if in some cases it is better to just parse it as a 
> string, such as in this case where it needs to retain to 
> remain the tagging.  Or maybe xslt would have been a better 
> option to go with from the beginning?
> 
> On 6/24/07, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> > In the XPath data model, you see nodes rather than markup. 
> That's why 
> > there's no "<" present. Instead, the Definition element will have a 
> > child that is a <sub> element.
> >
> > Evaluating the expression as a string will give you the 
> string value 
> > of the node, this is the concatenation of all the contained text, 
> > ignoring the markup.
> >
> > You seem to want to serialize the node as XML, to reinstate 
> the markup.
> > There's no direct way of doing that in the XPath API; you probably 
> > have to do an identity transformation from a DOMSource 
> containing the 
> > node to a StreamResult. (You'll have to change your call to 
> retrieve a 
> > NODESET rather than a STRING). Alternatively there may be a method 
> > such as toXML() on the DOM Node object - I've forgotten.
> >
> > Michael Kay
> > http://www.saxonica.com/
> >
> > > -----Original Message-----
> > > From: Grant Slade [mailto:grant.slade@xxxxxxxxx]
> > > Sent: 24 June 2007 19:03
> > > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> > > Subject: [xsl] XPath Question (related to Java)
> > >
> > > Hi, I have the following xml which gets read from a file 
> as part of 
> > > a Node:
> > >             <Definition> An organic compound in which the 
> aldehyde 
> > > group (HC=O) is connected to a branched or unbranched 
> open chain of 
> > > carbon atoms rather than a ring.
> > > Some aldehydes are created during the reactions of 
> oxidants used as 
> > > disinfectants, particularly ozone (O<sub>3</sub>), with natural 
> > > organic matter. </Definition>
> > >
> > > When I run it through the following method  it ignores the
> > > <sub></sub>:
> > >       public String getDefinitionFromNode(Node node) throws 
> > > javax.xml.xpath.XPathExpressionException
> > >       {
> > >             XPath xpath = XPathFactory.newInstance().newXPath();
> > >             String definitionExpression = "Definition";
> > >             String definition = (String) 
> > > xpath.evaluate(definitionExpression, node, XPathConstants.STRING);
> > >             if(definition.contains("<"))
> > >                   System.out.println ("found a <");
> > >             else
> > >             {
> > >                   System.out.println ("did not find a <");
> > >             }
> > >             return definition;
> > >       }
> > >
> > > When the program runs, it outputs the following:
> > >
> > > did not find a <
> > > --------------------------------
> > > <dictionary n=""><TermName>aliphatic 
> > > aldehyde</TermName><Definition>An organic compound in which the 
> > > aldehyde group (HC=O) is connected to a branched or 
> unbranched open 
> > > chain of carbon atoms rather than a ring.
> > > Some aldehydes are created during the reactions of 
> oxidants used as 
> > > disinfectants, particularly ozone (O3), with natural organic 
> > > matter.</Definition></dictionary>
> > >
> > > How do I get it to output the <sub></sub> elements?
> > >
> > > The complete node is:
> > >         <Term>
> > >             <Entry> aliphatic aldehyde </Entry>
> > >             <Definition> An organic compound in which the 
> aldehyde 
> > > group (HC=O) is connected to a
> > >                 branched or unbranched open chain of carbon atoms 
> > > rather than a ring. Some aldehydes
> > >                 are created during the reactions of 
> oxidants used as 
> > > disinfectants, particularly
> > >                 ozone (O<sub>3</sub>), with natural 
> organic matter.
> > > </Definition>
> > >             <SeeAlso>disinfection by-product</SeeAlso>
> > >             <IMAGE fileName="A-17.gif"/>
> > >         </Term>

Current Thread