[xsl] Building and re-using an index gradually as multiple inter-related files get transformed

Subject: [xsl] Building and re-using an index gradually as multiple inter-related files get transformed
From: Fabre Lambeau <fabre.lambeau@xxxxxxxxx>
Date: Mon, 9 May 2011 15:35:40 +0100
Hi!
I'm after advice in how to build an "indexing" solution using XSLT 2.0.

Here is my use case (simplified a bit).
I have a number of XML files to "translate"/"re-map" into a second set
of XML files. For each input file, there will be a single output file
(1-to-1 relationship).
Each document lists a series of objects and their properties. This
"translation" consists of changing the identifier (GUID) of each
object in the source file.
However, some of the documents list objects that reference other
objects (dependencies). Whilst "translating" therefore, I need to keep
an index/dictionary of the old-vs-new identifiers, so that all
dependencies remain valid in the new set of files, but that there is
no overlap between original and new identifiers for any object.

Example (simplified, assume an XML representation)

SOURCE FILES
Fruits.xml
B  Name=Apple, ID=1
B  Name=Orange, ID=2
People.xml
  Name=Bob, ID=A
  Name=Marie, ID=B
Preferences.xml
  ID=Y, PersonID=A, FruitID=1
  ID=Z, PersonID=B, FruitID=1

TARGET FILES
Fruits.xml
  Name=Apple, ID=R
  Name=Orange, ID=T
People.xml
  Name=Bob, ID=434
  Name=Marie, ID=2526
Preferences.xml
  ID=G67, PersonID=434, FruitID=R
  ID=E43, PersonID=2526, FruitID=R

The example is obviously far more complex, with dozens of files and
complex dependencies. I know however the object model, and therefore
what objects have dependencies, and the direction of all dependencies.
I can therefore order the file transformation so as to ensure that no
file is processed if all its dependent objects have not already been
translated. BTW, I have no control over the identifiers themselves
(they are generated by a separate system).

I could obviously process each transformation one at a time, and every
time load the relevant source and target files already processed to
create the mapping index. However, I'm after a way to do this in one
single transformation.
The reason I'm stuck (mentally) is the following:
- Using XLST 2.0, I could use xslt:result-document to create the
target files. However, I believe I would not be able to load them in
the same transformation again (in order to do a lookup in them as
necessary when treating depencies)
- A variable, once defined, cannot be modified. I would therefore not
be able to create a global "index" of sort and keep adding to it as I
would in a procedural language.

What would be the best way to go about this?  A recursive template
that after each step passes the index generated at the previous step
and augments it?  Would I not run into performance problems when
treating hundreds of large source files?

--
Fabre Lambeau

Current Thread