[xsl] How should I structure a huge XSLT dataset best?

Subject: [xsl] How should I structure a huge XSLT dataset best?
From: Anthony Zawacki <zwacki@xxxxxxxxxx>
Date: Mon, 29 Sep 2003 16:32:22 -0400



Hello,

First, the question, and then I'll provide additional background that may
influence the answers...

A customer will have the ability to edit data using the program of their
choice, and provide the data in a comma delimited file.  The data will
consist of two peices of information:
3-6 digits, text string.

Where the 3-6 digits reprensent the initial portion of a telephone number,
and the text string specifies a treatment.  There will be approximately
10,000 entries in this data file.

My application processes messages describing telephone calls.  For each
message, I apply a stylesheet to determine the treatment.  My current
stylesheet takes into account much more information than just the telephone
number, and that processing logic will still be required.  I have two
concerns.

1.  I will be writing a program/script that accepts the CSV file, and
converts it into an XML document to be included into the stylesheet.  I
want to make sure that my output from this program is in a form that is as
effecient as possible.
2.  The XSLT will be executed many times, and needs to be effiecent as
possible, executing in not more than a few milliseconds.  This is a place
where speed is a higher priority than memory constraints.

I've done a little looking around, and most of the concerns are the other
way, meaning that the XML data file is huge, and the XSL file is tiny.
This is completely opposite of what I will be experiencing.  The XML data
that I will be processing is usually less than 1K.

Now, the background information:
My application is written in C++ using Xalan-C v1.6 on the AIX5.2 platform.
I have complete control over the XSLTs, but not over incoming/outgoing
messages.
Every stylesheet in my application is compiled at start-up time to maximize
efficiency.
Due to the requirement to insert/remove items from the message, I am using
Xerces 2.3 Deprecated DOM objects with Xalan.

My first reaction, without any planning, is to create an XML that is easily
indexed to pick out the status of each telephone number.  For example, if
41057 had a treatment code of 5, I would first lookup 410, then 4105, then
41057, then 410571.  The 410571 would not be found, so I would fall back
tothe 41057 answer.  I have not yet implemented anything, I am in the
design phase of how I should handle this, so I'm not sure what performance
impact this would have.  An obvious improvement would be to go in reverse,
all six digits, and then stop when a match is found.  Also, to avoid full
scans of the data, I thought about building a tree that would be indexed by
a single digit at each level, so to find the answer, it would start by
indexing by the 4.  This would result in another tree that could be indexed
by the [1], and continue down until there was a failure to match, similar
to the first method, but hopefully more efficient.

The lack of obvious precendence for this type of work makes it much more
difficult.  I'm used to being able to search the list or looking at the
XSLT FAQ and seeing easy solutions, but this type of issue doesn't seem to
have been addressed in the past.  Or am I missing something?

Thanks,
Anthony Zawacki

410-571-7161
zwacki@xxxxxxxxxx


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread