Subject: RE: [xsl] XPath Question (related to Java) From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Mon, 25 Jun 2007 23:03:13 +0100 |
I would certainly tend to do this in XSLT unless I needed to (and had time to) make it ultra-efficient in which case a Java solution might be faster. I would never attempt to hand-parse XML, but there are cases where combining several XML documents into one big document "by hand" is perfectly OK, including a bit of manipulation like stripping off the XML declaration - so long as you are confident the files all use the same encoding, don't use internal DTDs, and so on. Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: Grant Slade [mailto:grant.slade@xxxxxxxxx] > Sent: 25 June 2007 00:33 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: Re: [xsl] XPath Question (related to Java) > > Hi Michael - thanks for the heads up. Maybe I can ask you > and the group a more general question. What I was trying to > do was go through a file of dictionary terms, read in the > terms one at a time and then add them to a 3rd party native > xml database application that takes a well-formed xml > document (but in String format, thus my trying to get the > information from it in String format). I have been trying to > be a good student of XML and learn the APIs, but I am > wondering if in some cases it is better to just parse it as a > string, such as in this case where it needs to retain to > remain the tagging. Or maybe xslt would have been a better > option to go with from the beginning? > > On 6/24/07, Michael Kay <mike@xxxxxxxxxxxx> wrote: > > In the XPath data model, you see nodes rather than markup. > That's why > > there's no "<" present. Instead, the Definition element will have a > > child that is a <sub> element. > > > > Evaluating the expression as a string will give you the > string value > > of the node, this is the concatenation of all the contained text, > > ignoring the markup. > > > > You seem to want to serialize the node as XML, to reinstate > the markup. > > There's no direct way of doing that in the XPath API; you probably > > have to do an identity transformation from a DOMSource > containing the > > node to a StreamResult. (You'll have to change your call to > retrieve a > > NODESET rather than a STRING). Alternatively there may be a method > > such as toXML() on the DOM Node object - I've forgotten. > > > > Michael Kay > > http://www.saxonica.com/ > > > > > -----Original Message----- > > > From: Grant Slade [mailto:grant.slade@xxxxxxxxx] > > > Sent: 24 June 2007 19:03 > > > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > > > Subject: [xsl] XPath Question (related to Java) > > > > > > Hi, I have the following xml which gets read from a file > as part of > > > a Node: > > > <Definition> An organic compound in which the > aldehyde > > > group (HC=O) is connected to a branched or unbranched > open chain of > > > carbon atoms rather than a ring. > > > Some aldehydes are created during the reactions of > oxidants used as > > > disinfectants, particularly ozone (O<sub>3</sub>), with natural > > > organic matter. </Definition> > > > > > > When I run it through the following method it ignores the > > > <sub></sub>: > > > public String getDefinitionFromNode(Node node) throws > > > javax.xml.xpath.XPathExpressionException > > > { > > > XPath xpath = XPathFactory.newInstance().newXPath(); > > > String definitionExpression = "Definition"; > > > String definition = (String) > > > xpath.evaluate(definitionExpression, node, XPathConstants.STRING); > > > if(definition.contains("<")) > > > System.out.println ("found a <"); > > > else > > > { > > > System.out.println ("did not find a <"); > > > } > > > return definition; > > > } > > > > > > When the program runs, it outputs the following: > > > > > > did not find a < > > > -------------------------------- > > > <dictionary n=""><TermName>aliphatic > > > aldehyde</TermName><Definition>An organic compound in which the > > > aldehyde group (HC=O) is connected to a branched or > unbranched open > > > chain of carbon atoms rather than a ring. > > > Some aldehydes are created during the reactions of > oxidants used as > > > disinfectants, particularly ozone (O3), with natural organic > > > matter.</Definition></dictionary> > > > > > > How do I get it to output the <sub></sub> elements? > > > > > > The complete node is: > > > <Term> > > > <Entry> aliphatic aldehyde </Entry> > > > <Definition> An organic compound in which the > aldehyde > > > group (HC=O) is connected to a > > > branched or unbranched open chain of carbon atoms > > > rather than a ring. Some aldehydes > > > are created during the reactions of > oxidants used as > > > disinfectants, particularly > > > ozone (O<sub>3</sub>), with natural > organic matter. > > > </Definition> > > > <SeeAlso>disinfection by-product</SeeAlso> > > > <IMAGE fileName="A-17.gif"/> > > > </Term>
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] XPath Question (related t, Grant Slade | Thread | RE: [xsl] Constructing a tree from , Simon Shutter |
Re: [xsl] Sorting by document order, Eric Bréchemier | Date | Re: [xsl] Sorting by document order, David Carlisle |
Month |