Re: [EXTERNAL] Re: [xsl] Extracting a XLSX directly using XSLT 3 and Saxon HE

Subject: Re: [EXTERNAL] Re: [xsl] Extracting a XLSX directly using XSLT 3 and Saxon HE
From: "Gayanthika Udeshani gudeshani@xxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 7 Jul 2022 05:49:56 -0000
thank you all for your valuable feedback,

Together with XProc to extract files from the archive that is the .xlsx
> file you should be able to process the stuff with Saxon HE


I'm currently following the above method, and based on your comments I
figured that I am on the correct path

Thank you again!

Cheers

Gayanthika

On Thu, Jul 7, 2022 at 3:22 AM Kevin Brown kevin.brown@xxxxxxxxxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> While it is not a "XSLT" solution, we use exist-db to do this all the time.
> We have utilities to import .docx or .xslx, unzip ... modify the
> document.xml, put if back in and re-zip for download.
>
> Kevin Brown
>
> -----Original Message-----
> From: Martin Honnen martin.honnen@xxxxxx <
> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
> Sent: Wednesday, July 6, 2022 1:10 PM
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: [xsl] Extracting a XLSX directly using XSLT 3 and Saxon HE
>
>
> On 06.07.2022 18:39, Gayanthika Udeshani gudeshani@xxxxxxxxxx wrote:
> >
> >
> > I did some research and I couldn't find whether there is any XSLT 3
> > feature in the Saxon HE, which allows extracting the xlsx (Excel file)
> > directly. I found some solutions using Saxon EE, want to know whether
> > I have missed something which we can apply using the HE.
> >
> On closer look, it seems even Saxon HE can treat an xlsx file as a zip
> archive if you pass in the right configuration property (example worked for
> me with Windows Powershell on the command line for Saxon HE 11 and 10)
>
>    --zipUriPattern:'.*\.xlsx'
>
>  From there you can use e.g.
>
>    uri-collection('excel-sheet1.xlsx')
>
> to get the URIs of (some or all?) contained files, it appears, and then
> you can read the XML files with the doc function e.g.
>
> doc('jar:file:/C:/SomePath/SomeDir/excel-sheet1.xlsx!/xl/workbook.xml')
>
> But as Mike said, the whole structure is rather complicated, with all the
> references across various files you either need to know your way around
> Spreadsheet ML or perhaps already have a stylesheet by someone that learned
> to process such a structure to extract/transform the spreadsheet data.

Current Thread