Re: RE: RE: [xsl] DOM and XML parser

Subject: Re: RE: RE: [xsl] DOM and XML parser
From: Mike Brown <mike@xxxxxxxx>
Date: Sat, 17 Aug 2002 10:11:30 -0600 (MDT)
ashu  t wrote:
> One thing I asked that who creates the tree like structure (even 
> on conceptual level)XML Parser or XSLT Processor?

Both. The parser reads the raw file(s) that comprise the XML document,
decoding bytes into characters, condensing character references, (e.g., the 4
characters "&#33;" become the 1 character "!"), normalizing whitespace in
attribute values, using the DTD to fill in default attribute values and
resolve entities, and checking for well-formedness. The parser passes along 
the 'important' information about the XML document to the application (the 
XSLT processor).

The information it passes is pretty much exactly what the processor needs in
order to model the XPath/XSLT node tree. For example, the parser says things
like "there is an element named 'stylesheet' in namespace
'http://www.w3.org/1999/XSL/Transform', its lexical name is 'xsl:stylesheet',
it has an attribute named 'version' with value '1.0', it contains an element
named 'template'..." and so on. SAX and DOM parsers do this in very different
ways, but the idea is the same.

The parser does not report lexical differences. For example,

 <foo a1="one" a2="two">1 &amp; 2 are &lt; 3</foo>

and a mess like

 <foo
      a1 = "one
               "  a2 = "&#x74;&#x77;&#x6F;"
     ><![CDATA[1 & 2 are < 3]]></foo>

mean exactly the same thing and are reported the same; the XSLT processor will
never know the original looked one way or the other. It just knows that the
following logical information items exist and have this relationship to each
other:

element type 'foo' in no namespace
  |  \__attribute name 'a1', value character data 'one'
  |  \__attribute name 'a2', value character data 'two'
  |
  |__character data '1 & 2 are < 3'

The processor is required to treat this information as if it were structured
according to the XPath/XSLT node tree model, like this:

element node named 'foo' in no namespace
  |  \__namespace node binding prefix 'xml' to name 'http://www.w3.org/XML/1998/namespace'
  |  \__attribute node named 'a1', value character data 'one'
  |  \__attribute node named 'a2', value character data 'two'
  |
  |__text node encapsulating '1 & 2 are < 3'

A DOM parser uses a similar kind of tree of nodes that is implicit through
the interfaces it provides. However, this tree is not entirely compatible with 
an XPath/XSLT tree, and it requires more memory than it should, so AFAIK most
XSLT processors, if they take a DOM document as input, walk the DOM tree and
build their own XPath/XSLT tree from it, so they can discard the DOM. This is 
slow, too, so most XSLT processors prefer to use a SAX parser when possible.
A SAX parser is event-based and just zips through the document once, reporting 
what it finds along the way, by calling methods that the application has 
implemented to handle the reported events.

   - Mike
____________________________________________________________________________
  mike j. brown                   |  xml/xslt: http://skew.org/xml/
  denver/boulder, colorado, usa   |  resume: http://skew.org/~mike/resume/

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread