Re: [xsl] Building and re-using an index gradually as multiple inter-related files get transformed

Subject: Re: [xsl] Building and re-using an index gradually as multiple inter-related files get transformed
From: Fabre Lambeau <fabre.lambeau@xxxxxxxxx>
Date: Mon, 9 May 2011 16:37:37 +0100
This was the simplification indeed.
Instead of XML documents, I call a REST webservice (not mine own) with
the EXPath HTTP client.
The workflow is:
- I send a GET request to get a list of objects of one type
- I modify the XML response payload to remove identifiers (and modify
some values)
- I send a PUT request with the modified payload
- The webservice responds with a new XML payload containing the
submitted objects with their new identifiers  (which are GUIDs
assigned randomly, ie. they cannot be "guessed" from the properties of
the object.

The mapping index is therefore created by matching the first response
to the second one and extracting the identifiers from both.

Ideally, I would like to avoid using anything but XSLT to solve this,
if possible.

Fabre Lambeau


On 9 May 2011 16:21, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> You haven't said how the new identifiers are generated (where do 434 and
> 2526 come from?).
>
> The functional solution to this is to recognize that there is a function
> f(oldID) -> newID that translates old identifiers to new identifiers. You
> just need to call this function every time you want to do the translation
> (not just the first time), and ensure of course that the function always
> returns the same newID when given the same oldID.
>
> Now, how do you implement this function efficiently? I can't tell you,
> because you haven't told us anything about it.
>
> Michael Kay
> Saxonica
>
>
> On 09/05/2011 15:35, Fabre Lambeau wrote:
>>
>> Hi!
>> I'm after advice in how to build an "indexing" solution using XSLT 2.0.
>>
>> Here is my use case (simplified a bit).
>> I have a number of XML files to "translate"/"re-map" into a second set
>> of XML files. For each input file, there will be a single output file
>> (1-to-1 relationship).
>> Each document lists a series of objects and their properties. This
>> "translation" consists of changing the identifier (GUID) of each
>> object in the source file.
>> However, some of the documents list objects that reference other
>> objects (dependencies). Whilst "translating" therefore, I need to keep
>> an index/dictionary of the old-vs-new identifiers, so that all
>> dependencies remain valid in the new set of files, but that there is
>> no overlap between original and new identifiers for any object.
>>
>> Example (simplified, assume an XML representation)
>>
>> SOURCE FILES
>> Fruits.xml
>> B  Name=Apple, ID=1
>> B  Name=Orange, ID=2
>> People.xml
>> B  Name=Bob, ID=A
>> B  Name=Marie, ID=B
>> Preferences.xml
>> B  ID=Y, PersonID=A, FruitID=1
>> B  ID=Z, PersonID=B, FruitID=1
>>
>> TARGET FILES
>> Fruits.xml
>> B  Name=Apple, ID=R
>> B  Name=Orange, ID=T
>> People.xml
>> B  Name=Bob, ID=434
>> B  Name=Marie, ID=2526
>> Preferences.xml
>> B  ID=G67, PersonID=434, FruitID=R
>> B  ID=E43, PersonID=2526, FruitID=R
>>
>> The example is obviously far more complex, with dozens of files and
>> complex dependencies. I know however the object model, and therefore
>> what objects have dependencies, and the direction of all dependencies.
>> I can therefore order the file transformation so as to ensure that no
>> file is processed if all its dependent objects have not already been
>> translated. BTW, I have no control over the identifiers themselves
>> (they are generated by a separate system).
>>
>> I could obviously process each transformation one at a time, and every
>> time load the relevant source and target files already processed to
>> create the mapping index. However, I'm after a way to do this in one
>> single transformation.
>> The reason I'm stuck (mentally) is the following:
>> - Using XLST 2.0, I could use xslt:result-document to create the
>> target files. However, I believe I would not be able to load them in
>> the same transformation again (in order to do a lookup in them as
>> necessary when treating depencies)
>> - A variable, once defined, cannot be modified. I would therefore not
>> be able to create a global "index" of sort and keep adding to it as I
>> would in a procedural language.
>>
>> What would be the best way to go about this? B A recursive template
>> that after each step passes the index generated at the previous step
>> and augments it? B Would I not run into performance problems when
>> treating hundreds of large source files?
>>
>> --
>> Fabre Lambeau
>
>



--
Fabre Lambeau

Current Thread