RE: Paging using XSLT

Subject: RE: Paging using XSLT
From: Joshua Allen <joshuaa@xxxxxxxxxxxxx>
Date: Thu, 2 Nov 2000 17:17:24 -0400 (EST)
But ... databases *can* directly perform

select * from A
return 50 rows where x > the_last_x_in_my_previous_resultset

With SQL Server you can use the "TOP" operator or
"SET ROWCOUNT" to specify how many rows are returned.
I am certain that DB2 has this capability as well, and
I think Oracle goes even beyond this ("bookmarks" or
something?  It has been a long time...)

Of course, your example of using a column indexed specifically
for paging is a great strategy.  But if you already are returning
the data in some non-random order, then you can also use the above
idea to return pages.  Using the above avoids the need to scroll
through the resultset each time, and allows page cutoffs to change
as data gets inserted.  Materialized views can alleviate the
pain of joins, and heavy-read situations are not friendly to joins
anyway.

If you can tolerate a small degree of data staleness, it would
also be possible to pull back the whole *entire* resultset as XML and 
cache the individual "pages" as separate XML files on the filesystem.
(I mean, I have yet to see a system that can update the user's
web-page when the underlying data in the database changes, so you
always have *some* degree of staleness).  Recognizing this and
the relation to caching leads to all sorts of interesting options.
One of the worst possible choices for paging is to scroll through the
*entire* rowset *every* time a user makes a request.  You may have to
do this in some situations, but probably far fewer than most people
realize.

Just some additional info to the original idea of paging via
XSLT -- sometimes people want to do this so that they can
target devices with multiple form factors (i.e. use 30 lines
per page if the browser is Netscape, use 11 lines per page
if it's a Palm VII).  One word of caution in this approach is
that usability is so challenging on the small devices that
usability concerns usually far outweigh the programmability
concerns that lead people to consider this strategy.  You
might make it work, but many people have experienced the only
way to get their systems tolerable on small devices is custom-
build UI for each device (for example, on a web browser you can
show city and state together for input, but on a WML device
you are better to have one screen that lets you choose a letter
range for the first letter of the city, then the next screen with more
detail, etc.; very difficult to XSLT-ize).  So don't be too disappointed
if reality foils the well-laid plans :-)




> -----Original Message-----
> From: Terris [mailto:terris@xxxxxxxxxx]
> Sent: Wednesday, November 01, 2000 8:20 AM
> To: xsl-list@xxxxxxxxxxxxxxxx
> Subject: Re: Paging using XSLT
> 
> 
> Ah-ha.  One of my favorite topics.  Strap
> yourself in...
> 
> Easier said than done.  Databases can not directly
> perform 
> 
> select * from A 
> return rows 50-60
> 
> This is a simple example.  Most queries contain joins.
> Queries can be expensive.  The query cost can
> prohibit executing the same query 
> over and over again, scrolling through 
> the results for the desired subset.  And anyway
> the result set might change between
> executions.  You would be surprised
> how many sites actually do this though.
> 
> You either consume disk or memory, take 
> your pick.  If you go the disk route (disk
> is much cheaper than memory),
> you either use the database or 
> the file system.  For example, you could
> store XML data sets in files.
> 
> One database-based solution is to 
> create a temporary table
> that holds the query results.  This temporary table
> has an additional "row number" column which
> is indexed.
> 
> There are several problems with this solution...
> 1. It consumes database resources 
> 2. If you use a "real" temporary table you 
>     have to associate a database connection with
>     the session .. don't even try it.  So you
>     have no choice but to create real tables
>     on the fly.
> 3. Tables have to be cleaned up after
>     a session times out
> 4. Need to make sure the same session can only
>     have one active temporary table at a time,
>     otherwise one session can consume an
>     excessive amount of disk
> 5. Database engines were not designed
>     for this -- engines like the schema to
>     stay constant, and creating and
>     dropping tables every few minutes is
>     disastrous
> 
> Instead of using temporary tables,
> create one "query results" table that is
> also indexed on session ID.  You still
> have to garbage collect the rows
> at some point, but managing rows is easier 
> than managing entire tables.
> 
> A database solution is appealing because
> you can leverage indexes.  A database
> can be accessed from any machine on
> a server farm, so sessions don't have
> to be tied to particular servers.
> 
> The devil is always in the details.
> Where does the row number come from?
> Unfortunately, you usually can't use
> the select INTO clause to populate
> the query results table.  This is because
> you can't tell SQL to generate an
> incrementing row number based on
> the ORDER BY clause.  Some
> dialects of SQL might actually support
> this.  Without such support you have to
> write code that iterates through
> the result set, adds an incrementing
> row number column, and
> inserts the row into the query results table.
> This can be very slow for large result
> sets.
> 
> It seems natural to use XML to represent
> temporary result sets, but generally 
> this approach doesn't scale.
> 
> XML files are fundamentally
> slow because the file system can't
> index XML documents.  Stores like
> eXcelon can index XML but all of 
> the undesirable features of temporary
> tables described earlier will creep up.
> 
> If your data sets are small,
> you could get away with managing
> temporary XML files in the file system.
> You could try to forego files and 
> cache documents in memory.
> Row-based access is fast when the document
> has an ID attribute whose
> value indicates the row number (r1, r2, 
> r3, etc.).  XML parsers automatically index ID
> attributes for fast access.  However, 
> this solution doesn't work on server farms.
> 
> In conclusion...
> 
> The short answer is this is not easy.
> The longer answer is that the
> best solution seems to be to store
> query results in a permanent table that 
> is specifically indexed for page-based access.
> The best implementation depends 
> on the size of the result sets.  Database-based
> solutions scale up to very large result
> sets and scale out to web farms.
> 
> You can even pre-populate the
> query results table with your most popular 
> queries.  This out-perfoms all possible
> implementations and that's an
> understatement.  Of course the assumption is
> that the underlying data doesn't change frequently,
> which may or may not be correct.
> 
> I wish I could say that raw XML technologies 
> somehow solve this problem.  There are limitations
> to all tools.  XML could be a facilitator,
> however,  You could build a component that 
> manages result sets in a database and returns "pages" as
> XML.  NOW you're talking.
> 
> Oops, I just leaked my business plan :^)
> 
> http://xmleverywhere.com
> 
> ----- Original Message ----- 
> From: "Steve Muench" <Steve.Muench@xxxxxxxxxx>
> To: <xsl-list@xxxxxxxxxxxxxxxx>
> Sent: Wednesday, November 01, 2000 5:54 AM
> Subject: Re: Paging using XSLT
> 
> 
> > Much more efficient is to, as you suggest, query-up/pass-in
> > only the 15 rows in the current "window" you want to format
> > and not to use the in-memory filtering capability of XSLT
> > as a replacement for the fast indexes that a database might
> > be able to use to quickly return just the rows you want
> > to "see" on the current page.
> 
> 
> 
> 
> 
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> 




 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread