Re: Re: RE: RE: [xsl] DOM and XML parser

Thanks a lot It is really invaluable and indepth information about XSLT Processor and XML Parser. Thanks ashu

Both. The parser reads the raw file(s) that comprise the XML document, decoding bytes into characters, condensing character references, (e.g., the 4 characters "!" become the 1 character "!"), normalizing whitespace in attribute values, using the DTD to fill in default attribute values and resolve entities, and checking for well-formedness. The parser passes along the 'important' information about the XML document to the application (the XSLT processor).

The information it passes is pretty much exactly what the processor needs in order to model the XPath/XSLT node tree. For example, the parser says things like "there is an element named 'stylesheet' in namespace 'http://www.w3.org/1999/XSL/Transform', its lexical name is 'xsl:stylesheet', it has an attribute named 'version' with value '1.0', it contains an element named 'template'..." and so on. SAX and DOM parsers do this in very different ways, but the idea is the same.

The parser does not report lexical differences. For example,

<foo a1="one" a2="two">1 & 2 are < 3</foo>

and a mess like
 <foo
      a1 = "one
               "  a2 = "two"
     ><![CDATA[1 & 2 are < 3]]></foo>
mean exactly the same thing and are reported the same; the XSLT processor will never know the original looked one way or the other. It just knows that the following logical information items exist and have this relationship to each other:
element type 'foo' in no namespace
  |  \__attribute name 'a1', value character data 'one'
  |  \__attribute name 'a2', value character data 'two'
  |
  |__character data '1 & 2 are < 3'
The processor is required to treat this information as if it were structured according to the XPath/XSLT node tree model, like this:

element node named 'foo' in no namespace | \__namespace node binding prefix 'xml' to name 'http://www.w3.org/XML/1998/namespace' | \__attribute node named 'a1', value character data 'one' | \__attribute node named 'a2', value character data 'two' | |__text node encapsulating '1 & 2 are < 3'

A DOM parser uses a similar kind of tree of nodes that is implicit through the interfaces it provides. However, this tree is not entirely compatible with an XPath/XSLT tree, and it requires more memory than it should, so AFAIK most XSLT processors, if they take a DOM document as input, walk the DOM tree and build their own XPath/XSLT tree from it, so they can discard the DOM. This is slow, too, so most XSLT processors prefer to use a SAX parser when possible. A SAX parser is event-based and just zips through the document once, reporting what it finds along the way, by calling methods that the application has implemented to handle the reported events.

- Mike ____________________________________________________________________________ mike j. brown | xml/xslt: http://skew.org/xml/ denver/boulder, colorado, usa | resume: http://skew.org/~mike/resume/

XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list

XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list

<- Previous	Index	Next ->
Re: Re: RE: RE: [xsl] DOM and XML p, ashu t	Thread	Re: Re: RE: RE: [xsl] DOM and XML p, ashu t
Re: Re: RE: RE: [xsl] DOM and XML p, ashu t	Date	Re: Re: RE: RE: [xsl] DOM and XML p, ashu t
	Month

<-prev [Thread] next->	<-prev [Date] next->
Month Index \| List Home