Re: [xsl] Extracting a XLSX directly using XSLT 3 and Saxon HE

Subject: Re: [xsl] Extracting a XLSX directly using XSLT 3 and Saxon HE
From: "Kevin Brown kevin.brown@xxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 6 Jul 2022 21:52:06 -0000
While it is not a "XSLT" solution, we use exist-db to do this all the time.
We have utilities to import .docx or .xslx, unzip ... modify the document.xml,
put if back in and re-zip for download.

Kevin Brown

-----Original Message-----
From: Martin Honnen martin.honnen@xxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Wednesday, July 6, 2022 1:10 PM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [xsl] Extracting a XLSX directly using XSLT 3 and Saxon HE


On 06.07.2022 18:39, Gayanthika Udeshani gudeshani@xxxxxxxxxx wrote:
>
>
> I did some research and I couldn't find whether there is any XSLT 3
> feature in the Saxon HE, which allows extracting the xlsx (Excel file)
> directly. I found some solutions using Saxon EE, want to know whether
> I have missed something which we can apply using the HE.
>
On closer look, it seems even Saxon HE can treat an xlsx file as a zip archive
if you pass in the right configuration property (example worked for me with
Windows Powershell on the command line for Saxon HE 11 and 10)

   --zipUriPattern:'.*\.xlsx'

 From there you can use e.g.

   uri-collection('excel-sheet1.xlsx')

to get the URIs of (some or all?) contained files, it appears, and then you
can read the XML files with the doc function e.g.

doc('jar:file:/C:/SomePath/SomeDir/excel-sheet1.xlsx!/xl/workbook.xml')

But as Mike said, the whole structure is rather complicated, with all the
references across various files you either need to know your way around
Spreadsheet ML or perhaps already have a stylesheet by someone that learned to
process such a structure to extract/transform the spreadsheet data.

Current Thread