Re: [xsl] Running the same transformation on many input files, optimisation possible?

Subject: Re: [xsl] Running the same transformation on many input files, optimisation possible?
From: "Rolf Kleef rolf@xxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 16 Dec 2019 14:43:52 -0000
The way I do this is with Ant indeed:

Ant does a single XSLT compilation, then applies it to all input files
where the output file is older than the input file or doesn't exist
(which may provide another optimisation).

I use a build.xml like this to run `ant transform-files`.

<project>
  <target name="transform-files">
    <xslt
      basedir="/workspace/input/"
      includes="*.xml"
      destdir="/workspace/tmp"
      extension=".new.xml"
      style="transform.xslt"
     />
  </target>
</project>

Instead of the basedir and includes attributes, you should be able to
create "filelist" or "fileset" collections of files to be processed
inside the <xslt> tags. There are ways to combine these, to end up with
a single list of input files and benefit from a single XSLT
compilation.

https://ant.apache.org/manual/Types/filelist.html
https://ant.apache.org/manual/Types/fileset.html

~~Rolf.

On Sun, 2019-12-15 at 22:12 +0000, Michael Kay mike@xxxxxxxxxxxx wrote:
> Note that there's a double overhead here: firstly you're bringing up
> a new Java VM for each transformation, and secondly you're
> recompiling the stylesheet for each transformation.
> You can avoid the Java loading overhead by using ant or XProc, but
> I'm not sure either of them will avoid the overhead of recompiling
> the stylesheet; though if you use a a recent Saxon version, you could
> achieve that by reloading the stylesheet from a pre-compiled SEF
> (stylesheet export file).
> 
> You could write your own Java application to control the process,
> invoking Saxon via the JAXP or s9api APIs - both allow you to compile
> a stylesheet once and execute it repeatedly.
> 
> You might be able to write the control loop in XSLT, for example by
> using the collection() function, or functions in the EXPath file
> module. However, this could require stylesheet changes if your XSLT
> code binds global variables to values derived from the source
> document.
> 
> In very simple cases you can take advantage of the fact that the -s
> option for the Saxon command line can be a directory, in which case
> all the input files are transformed to corresponding files in the -o
> directory.
> 
> Michael Kay
> Saxonica
> 
> > On 15 Dec 2019, at 09:03, Trevor Nicholls trevor@xxxxxxxxxxxxxxxxxx
> >  <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> > Hi
> >  
> > An application I am working on contains a large number of source
> > documents which are all run through the same series of
> > transformations. While initially the build process didn't take long
> > the cost of repeatedly initialising the XSL processor soon adds up,
> > so I am looking at ways to streamline it.
> >  
> > Our processor of choice is Saxon (currently we are using 8.7.3) so
> > I can shift this question to the Saxon list if there are extensions
> > there that are relevant.
> >  
> > So the question; given a script that essentially includes the
> > following:
> >  
> > cd documents
> > for d in `cat dlist`; do
> >   cd $d
> >   for f in `cat flist`; do
> >     java -jar $SAXONDIR/saxon8.jar  -o  $f.new.xml  $f.xml
> >  $SCRIPTDIR/transform.xsl  doc=$d  file=$f
> >   done
> > done
> >  
> > is there a mechanism which would allow a single Java process to
> > perform the equivalent?
> >  
> > Thanks
> > T
> >  
> > XSL-List info and archiveEasyUnsubscribe (by email)
> 
> 
> 
> 
> XSL-List info and archive
> 
> EasyUnsubscribe
> (by email)

Current Thread