Re: Paging using XSLT

Ah-ha.  One of my favorite topics.  Strap
yourself in...

Easier said than done.  Databases can not directly
perform 

select * from A 
return rows 50-60

This is a simple example.  Most queries contain joins.
Queries can be expensive.  The query cost can
prohibit executing the same query 
over and over again, scrolling through 
the results for the desired subset.  And anyway
the result set might change between
executions.  You would be surprised
how many sites actually do this though.

You either consume disk or memory, take 
your pick.  If you go the disk route (disk
is much cheaper than memory),
you either use the database or 
the file system.  For example, you could
store XML data sets in files.

One database-based solution is to 
create a temporary table
that holds the query results.  This temporary table
has an additional "row number" column which
is indexed.

There are several problems with this solution...
1. It consumes database resources 
2. If you use a "real" temporary table you 
    have to associate a database connection with
    the session .. don't even try it.  So you
    have no choice but to create real tables
    on the fly.
3. Tables have to be cleaned up after
    a session times out
4. Need to make sure the same session can only
    have one active temporary table at a time,
    otherwise one session can consume an
    excessive amount of disk
5. Database engines were not designed
    for this -- engines like the schema to
    stay constant, and creating and
    dropping tables every few minutes is
    disastrous

Instead of using temporary tables,
create one "query results" table that is
also indexed on session ID.  You still
have to garbage collect the rows
at some point, but managing rows is easier 
than managing entire tables.

A database solution is appealing because
you can leverage indexes.  A database
can be accessed from any machine on
a server farm, so sessions don't have
to be tied to particular servers.

The devil is always in the details.
Where does the row number come from?
Unfortunately, you usually can't use
the select INTO clause to populate
the query results table.  This is because
you can't tell SQL to generate an
incrementing row number based on
the ORDER BY clause.  Some
dialects of SQL might actually support
this.  Without such support you have to
write code that iterates through
the result set, adds an incrementing
row number column, and
inserts the row into the query results table.
This can be very slow for large result
sets.

It seems natural to use XML to represent
temporary result sets, but generally 
this approach doesn't scale.

XML files are fundamentally
slow because the file system can't
index XML documents.  Stores like
eXcelon can index XML but all of 
the undesirable features of temporary
tables described earlier will creep up.

If your data sets are small,
you could get away with managing
temporary XML files in the file system.
You could try to forego files and 
cache documents in memory.
Row-based access is fast when the document
has an ID attribute whose
value indicates the row number (r1, r2, 
r3, etc.).  XML parsers automatically index ID
attributes for fast access.  However, 
this solution doesn't work on server farms.

In conclusion...

The short answer is this is not easy.
The longer answer is that the
best solution seems to be to store
query results in a permanent table that 
is specifically indexed for page-based access.
The best implementation depends 
on the size of the result sets.  Database-based
solutions scale up to very large result
sets and scale out to web farms.

You can even pre-populate the
query results table with your most popular 
queries.  This out-perfoms all possible
implementations and that's an
understatement.  Of course the assumption is
that the underlying data doesn't change frequently,
which may or may not be correct.

I wish I could say that raw XML technologies 
somehow solve this problem.  There are limitations
to all tools.  XML could be a facilitator,
however,  You could build a component that 
manages result sets in a database and returns "pages" as
XML.  NOW you're talking.

Oops, I just leaked my business plan :^)

http://xmleverywhere.com

----- Original Message ----- 
From: "Steve Muench" <Steve.Muench@xxxxxxxxxx>
To: <xsl-list@xxxxxxxxxxxxxxxx>
Sent: Wednesday, November 01, 2000 5:54 AM
Subject: Re: Paging using XSLT


> Much more efficient is to, as you suggest, query-up/pass-in
> only the 15 rows in the current "window" you want to format
> and not to use the in-memory filtering capability of XSLT
> as a replacement for the fast indexes that a database might
> be able to use to quickly return just the rows you want
> to "see" on the current page.





 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
<- Previous	Index	Next ->
Re: Paging using XSLT, Steve Muench	Thread	Re: Paging using XSLT, David M. Karr
Formating Dates, Kuruvilla, Suresh	Date	RE: Paging using XSLT, Kevin Williams
	Month
<-prev [Thread] next->	<-prev [Date] next->
Month Index \| List Home