Re: [xsl] Building and re-using an index gradually as multiple inter-related files get transformed

Subject: Re: [xsl] Building and re-using an index gradually as multiple inter-related files get transformed
From: Michel Hendriksen <michel.hendriksen@xxxxxxxxx>
Date: Tue, 10 May 2011 11:53:55 +0200
Hi

You could setup a pipeline, but would depend on how many files to
transform, and generate output as normal and an index into a variable.
And then for each next step, copy the old index and add the new
objects. Something like

proces file1

var index1
    proces file1 mode = index

proces file2
   with param index = index1

var index2
   copy index1
   process file2 mode = index

etc.

Not so nice when there are a lot of files ofcourse.

When indexes are set during processing you might need to process to a
variable and process that for your index, and also copy that to
output.

Michel



On Mon, May 9, 2011 at 5:37 PM, Fabre Lambeau <fabre.lambeau@xxxxxxxxx>
wrote:
> This was the simplification indeed.
> Instead of XML documents, I call a REST webservice (not mine own) with
> the EXPath HTTP client.
> The workflow is:
> - I send a GET request to get a list of objects of one type
> - I modify the XML response payload to remove identifiers (and modify
> some values)
> - I send a PUT request with the modified payload
> - The webservice responds with a new XML payload containing the
> submitted objects with their new identifiers  (which are GUIDs
> assigned randomly, ie. they cannot be "guessed" from the properties of
> the object.
>
> The mapping index is therefore created by matching the first response
> to the second one and extracting the identifiers from both.
>
> Ideally, I would like to avoid using anything but XSLT to solve this,
> if possible.
>
> Fabre Lambeau
>
>
> On 9 May 2011 16:21, Michael Kay <mike@xxxxxxxxxxxx> wrote:
>> You haven't said how the new identifiers are generated (where do 434 and
>> 2526 come from?).
>>
>> The functional solution to this is to recognize that there is a function
>> f(oldID) -> newID that translates old identifiers to new identifiers. You
>> just need to call this function every time you want to do the translation
>> (not just the first time), and ensure of course that the function always
>> returns the same newID when given the same oldID.
>>
>> Now, how do you implement this function efficiently? I can't tell you,
>> because you haven't told us anything about it.
>>
>> Michael Kay
>> Saxonica
>>
>>
>> On 09/05/2011 15:35, Fabre Lambeau wrote:
>>>
>>> Hi!
>>> I'm after advice in how to build an "indexing" solution using XSLT 2.0.
>>>
>>> Here is my use case (simplified a bit).
>>> I have a number of XML files to "translate"/"re-map" into a second set
>>> of XML files. For each input file, there will be a single output file
>>> (1-to-1 relationship).
>>> Each document lists a series of objects and their properties. This
>>> "translation" consists of changing the identifier (GUID) of each
>>> object in the source file.
>>> However, some of the documents list objects that reference other
>>> objects (dependencies). Whilst "translating" therefore, I need to keep
>>> an index/dictionary of the old-vs-new identifiers, so that all
>>> dependencies remain valid in the new set of files, but that there is
>>> no overlap between original and new identifiers for any object.
>>>
>>> Example (simplified, assume an XML representation)
>>>
>>> SOURCE FILES
>>> Fruits.xml
>>>   Name=Apple, ID=1
>>>   Name=Orange, ID=2
>>> People.xml
>>>   Name=Bob, ID=A
>>>   Name=Marie, ID=B
>>> Preferences.xml
>>>   ID=Y, PersonID=A, FruitID=1
>>>   ID=Z, PersonID=B, FruitID=1
>>>
>>> TARGET FILES
>>> Fruits.xml
>>>   Name=Apple, ID=R
>>>   Name=Orange, ID=T
>>> People.xml
>>>   Name=Bob, ID=434
>>>   Name=Marie, ID=2526
>>> Preferences.xml
>>>   ID=G67, PersonID=434, FruitID=R
>>>   ID=E43, PersonID=2526, FruitID=R
>>>
>>> The example is obviously far more complex, with dozens of files and
>>> complex dependencies. I know however the object model, and therefore
>>> what objects have dependencies, and the direction of all dependencies.
>>> I can therefore order the file transformation so as to ensure that no
>>> file is processed if all its dependent objects have not already been
>>> translated. BTW, I have no control over the identifiers themselves
>>> (they are generated by a separate system).
>>>
>>> I could obviously process each transformation one at a time, and every
>>> time load the relevant source and target files already processed to
>>> create the mapping index. However, I'm after a way to do this in one
>>> single transformation.
>>> The reason I'm stuck (mentally) is the following:
>>> - Using XLST 2.0, I could use xslt:result-document to create the
>>> target files. However, I believe I would not be able to load them in
>>> the same transformation again (in order to do a lookup in them as
>>> necessary when treating depencies)
>>> - A variable, once defined, cannot be modified. I would therefore not
>>> be able to create a global "index" of sort and keep adding to it as I
>>> would in a procedural language.
>>>
>>> What would be the best way to go about this?  A recursive template
>>> that after each step passes the index generated at the previous step
>>> and augments it?  Would I not run into performance problems when
>>> treating hundreds of large source files?
>>>
>>> --
>>> Fabre Lambeau
>>
>>
>
>
>
> --
> Fabre Lambeau

Current Thread