Subject: [xsl] white space in xml should not be interpreted as text nodes From: "Markus Hanel" <markus.hanel@xxxxxx> Date: Thu, 11 Mar 2004 10:06:50 +0100 (MET) |
Hello, I hope this is the right mailinglist for this kind of topic. If not, do not hestiate to ignore this posting or direct me to another mailing list. Here is the problem: My application is a web server centered programm that uses mod_python and xml has to process xml files. These xml files have most of the time ignorable white spaces like \n, \r \t between the different tags. The problem is that minidom seems to interpret these white spaces as text nodes and I cannot know in before how many of these "text nodes" are in between the real data nodes. This seems to disturb the real structure of the dom tree and child nodes are no longer child nodes etc. That makes it hard to write a reliable xml application since I cannot know how many spaces the writer/editor of the xml file has put in between the tags. So I tried to find a way of getting rid of these unwanted text nodes with this piece of code but that did not help either: ################################################################################ # ################################################################################ def cleanUpNodes( nodes ): """Removes all TEXT_NODES in parameter nodes that contain only characters that are defined as whitespace in the string library""" for node in nodes.childNodes: if node.nodeType == Node.TEXT_NODE: node.data = string.strip(node.data) nodes.normalize() ################################################################################ # ################################################################################ I tried out also pulldom, but it interprets the white spaces as "CHARACTER" envents and not as "IGNORABLE_WHITSPACE" events. Another thing is that pulldom seems to never generates an "END_DOCUMENT" event ?! The big question is: Does anybody know a way around this problem ? Am I missing something ? How can I get rid of this unwanted white-space-text-nodes ? Here is an example that shows what the same code inteprets as child node when processing the same xml file without and with white spaces in between the tags: <############### XML File with white spaces #################> <root> <child_1> <child_11> <child_111 path="/qpers_data/" proto="file" /> </child_11> </child_1> <child_2 type="admin" status="active" label="root"> <child_21 path="/qnodes/admin/admin_root.xml" proto="file" /> </child_2> </root> <############################# Code #############################> #!/usr/bin/python from xml.dom import minidom from xml.dom import Node import string ################################################################################ def cleanUpNodes( nodes ): """Removes all TEXT_NODES in parameter nodes that contain only characters that are defined as whitespace in the string library""" for node in nodes.childNodes: if node.nodeType == Node.TEXT_NODE: node.data = string.strip(node.data) nodes.normalize() ############################################################################### def dumpTree( xmlFileIn, xmlFileOut ): try: dom = minidom.parse( xmlFileIn ) file = open( xmlFileOut, "w" ) except IOError, (errno, strerror): print "I/O error(%s): %s" % (errno, strerror ) return cleanUpNodes( dom.documentElement ) for node in dom.documentElement.childNodes: while ( node ): file.write( "\n node ->" + node.nodeName ) file.write( node.toxml('ISO-8859-1') ) node = node.firstChild file.close() return 1 ############################################################################### dumpTree( "index_wos.xml", "without_space.xml" ) <####################### Output with XML with whitespace ####################> node ->child_1<child_1> <child_11> <child_111 path="/qpers_data/" proto="file"/> </child_11> </child_1> node ->#text node ->child_2<child_2 label="root" status="active" type="admin"> <child_21 path="/qnodes/admin/admin_root.xml" proto="file"/> </child_2> node ->#text <#################### Output with XML without whitespace ####################> node ->child_1<child_1><child_11><child_111 path="/qpers_data/" /proto="file"/></child_11></child_1> node ->child_11<child_11><child_111 path="/qpers_data/" /proto="file"/></child_11> node ->child_111<child_111 path="/qpers_data/" proto="file"/> node ->child_2<child_2 label="root" status="active" type="admin"><child_21 /path="/qnodes/admin/admin_root.xml" proto="file"/></child_2> node ->child_21<child_21 path="/qnodes/admin/admin_root.xml" proto="file"/> regards, markus XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Is exclude-result-prefixe, Michael Kay | Thread | Re: [xsl] white space in xml should, Emmanuil Batsis (Man |
Re: [xsl] Process output from impor, Magnus Teo | Date | [xsl] Counting using variable insid, Arulraj |
Month |