Re: [xsl] Running the same transformation on many input files, optimisation possible?

Subject: Re: [xsl] Running the same transformation on many input files, optimisation possible?
From: "Dimitre Novatchev dnovatchev@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 15 Dec 2019 23:00:42 -0000
Thank you, Dr. Kay,

> Indeed, I overlooked the possibility of having the controlling loop
written in XSLT 3.0 and executing the per-transformation code using
fn:transform(),
> which gets over the problems of changing existing XSLT code if it relies
on global variables.

Does Saxon implement the semantics of
 *"cache" : true() *

as per the Spec?

Thanks,
Dimitre


On Sun, Dec 15, 2019 at 2:50 PM Michael Kay mike@xxxxxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> Indeed, I overlooked the possibility of having the controlling loop
> written in XSLT 3.0 and executing the per-transformation code using
> fn:transform(), which gets over the problems of changing existing XSLT code
> if it relies on global variables.
>
> Michael Kay
> Saxonica
>
> On 15 Dec 2019, at 22:35, Dimitre Novatchev dnovatchev@xxxxxxxxx <
> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> > Note that there's a double overhead here: firstly you're bringing up a
> new Java VM for each transformation,
> > and secondly you're recompiling the stylesheet for each transformation.
>
> Isn't the meaning of the
> *    "cache" : true()*
>
> key-value pair in the $*options *argument of `fn:transform()` exactly to
> compile the stylesheet only once and to reuse the compiled result on any
> time in the future the same stylesheet node or stylesheet text is passed?
>
> From the Spec (https://www.w3.org/TR/xpath-functions-31/#func-transform):
>
> cache 1.0, 2.0, 3.0 This option has no effect on the result of the
> transformation but may affect efficiency. The value true indicates an
> expectation that the same stylesheet is likely to be used for more than one
> transformation; the value false indicates an expectation that the
> stylesheet will be used once only.
>
>    - *Type: *xs:boolean
>    - *Default: *true()
>
>
>
> Thanks,
> Dimitre
>
>
> On Sun, Dec 15, 2019 at 2:12 PM Michael Kay mike@xxxxxxxxxxxx <
> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>> Note that there's a double overhead here: firstly you're bringing up a
>> new Java VM for each transformation, and secondly you're recompiling the
>> stylesheet for each transformation.
>>
>> You can avoid the Java loading overhead by using ant or XProc, but I'm
>> not sure either of them will avoid the overhead of recompiling the
>> stylesheet; though if you use a a recent Saxon version, you could achieve
>> that by reloading the stylesheet from a pre-compiled SEF (stylesheet export
>> file).
>>
>> You could write your own Java application to control the process,
>> invoking Saxon via the JAXP or s9api APIs - both allow you to compile a
>> stylesheet once and execute it repeatedly.
>>
>> You might be able to write the control loop in XSLT, for example by using
>> the collection() function, or functions in the EXPath file module. However,
>> this could require stylesheet changes if your XSLT code binds global
>> variables to values derived from the source document.
>>
>> In very simple cases you can take advantage of the fact that the -s
>> option for the Saxon command line can be a directory, in which case all the
>> input files are transformed to corresponding files in the -o directory.
>>
>> Michael Kay
>> Saxonica
>>
>> On 15 Dec 2019, at 09:03, Trevor Nicholls trevor@xxxxxxxxxxxxxxxxxx <
>> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> Hi
>>
>> An application I am working on contains a large number of source
>> documents which are all run through the same series of transformations.
>> While initially the build process didn't take long the cost of repeatedly
>> initialising the XSL processor soon adds up, so I am looking at ways to
>> streamline it.
>>
>> Our processor of choice is Saxon (currently we are using 8.7.3) so I can
>> shift this question to the Saxon list if there are extensions there that
>> are relevant.
>>
>> So the question; given a script that essentially includes the following:
>>
>> cd documents
>> for d in `cat dlist`; do
>>   cd $d
>>   for f in `cat flist`; do
>>     java -jar $SAXONDIR/saxon8.jar  -o  $f.new.xml  $f.xml
>>  $SCRIPTDIR/transform.xsl  doc=$d  file=$f
>>   done
>> done
>>
>> is there a mechanism which would allow a single Java process to perform
>> the equivalent?
>>
>> Thanks
>> T
>>
>> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
>> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/293509> (by
>> email)
>>
>>
>> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
>> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/782854> (by
>> email)
>>
>
>
> --
> Cheers,
> Dimitre Novatchev
> ---------------------------------------
> Truly great madness cannot be achieved without significant intelligence.
> ---------------------------------------
> To invent, you need a good imagination and a pile of junk
> -------------------------------------
> Never fight an inanimate object
> -------------------------------------
> To avoid situations in which you might make mistakes may be the
> biggest mistake of all
> ------------------------------------
> Quality means doing it right when no one is looking.
> -------------------------------------
> You've achieved success in your field when you don't know whether what
> you're doing is work or play
> -------------------------------------
> To achieve the impossible dream, try going to sleep.
> -------------------------------------
> Facts do not cease to exist because they are ignored.
> -------------------------------------
> Typing monkeys will write all Shakespeare's works in 200yrs.Will they
> write all patents, too? :)
> -------------------------------------
> Sanity is madness put to good use.
> -------------------------------------
> I finally figured out the only reason to be alive is to enjoy it.
>
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/293509> (by
> email)
>
>
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/782854> (by
> email <>)
>


-- 
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
To achieve the impossible dream, try going to sleep.
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they write
all patents, too? :)
-------------------------------------
Sanity is madness put to good use.
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.

Current Thread