Re: Re: RE: RE: [xsl] DOM and XML parser

Subject: Re: Re: RE: RE: [xsl] DOM and XML parser
From: "ashu t" <aashut@xxxxxxxxxxxxxx>
Date: 19 Aug 2002 08:19:50 -0000
Thanks a lot
It is really invaluable and indepth information about XSLT Processor and XML Parser.
Thanks
ashu



Both. The parser reads the raw file(s) that comprise the XML document,
decoding bytes into characters, condensing character references, (e.g., the 4
characters "!" become the 1 character "!"), normalizing whitespace in
attribute values, using the DTD to fill in default attribute values and
resolve entities, and checking for well-formedness. The parser passes along
the 'important' information about the XML document to the application (the
XSLT processor).


The information it passes is pretty much exactly what the processor needs in
order to model the XPath/XSLT node tree. For example, the parser says things
like "there is an element named 'stylesheet' in namespace
'http://www.w3.org/1999/XSL/Transform', its lexical name is 'xsl:stylesheet',
it has an attribute named 'version' with value '1.0', it contains an element
named 'template'..." and so on. SAX and DOM parsers do this in very different
ways, but the idea is the same.


The parser does not report lexical differences. For example,

<foo a1="one" a2="two">1 & 2 are < 3</foo>

and a mess like

 <foo
      a1 = "one
               "  a2 = "two"
     ><![CDATA[1 & 2 are < 3]]></foo>

mean exactly the same thing and are reported the same; the XSLT processor will
never know the original looked one way or the other. It just knows that the
following logical information items exist and have this relationship to each
other:


element type 'foo' in no namespace
  |  \__attribute name 'a1', value character data 'one'
  |  \__attribute name 'a2', value character data 'two'
  |
  |__character data '1 & 2 are < 3'

The processor is required to treat this information as if it were structured
according to the XPath/XSLT node tree model, like this:


element node named 'foo' in no namespace
| \__namespace node binding prefix 'xml' to name 'http://www.w3.org/XML/1998/namespace'
| \__attribute node named 'a1', value character data 'one'
| \__attribute node named 'a2', value character data 'two'
|
|__text node encapsulating '1 & 2 are < 3'


A DOM parser uses a similar kind of tree of nodes that is implicit through
the interfaces it provides. However, this tree is not entirely compatible with
an XPath/XSLT tree, and it requires more memory than it should, so AFAIK most
XSLT processors, if they take a DOM document as input, walk the DOM tree and
build their own XPath/XSLT tree from it, so they can discard the DOM. This is
slow, too, so most XSLT processors prefer to use a SAX parser when possible.
A SAX parser is event-based and just zips through the document once, reporting
what it finds along the way, by calling methods that the application has
implemented to handle the reported events.


- Mike
____________________________________________________________________________
mike j. brown | xml/xslt: http://skew.org/xml/
denver/boulder, colorado, usa | resume: http://skew.org/~mike/resume/


XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list



XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list


Current Thread