Subject: Re: [xsl] Extracting a XLSX directly using XSLT 3 and Saxon HE From: "Eliot Kimber eliot.kimber@xxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Wed, 6 Jul 2022 20:53:12 -0000 |
The Apache POI library[1] provides robust read and write capabilities for all the MS Office formats, including DOCX and XSLX, so depending on what you want to do that might be the more effective way to get at the data. POI manages all the XML complexity with the XSLX data. Cheers, E. [1] https://poi.apache.org/ _____________________________________________ Eliot Kimber Sr Staff Content Engineer O: 512 554 9368 M: 512 554 9368 servicenow.com<https://www.servicenow.com> LinkedIn<https://www.linkedin.com/company/servicenow> | Twitter<https://twitter.com/servicenow> | YouTube<https://www.youtube.com/user/servicenowinc> | Facebook<https://www.facebook.com/servicenow> From: Martin Honnen martin.honnen@xxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Wednesday, July 6, 2022 at 3:09 PM To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx <xsl-list@xxxxxxxxxxxxxxxxxxxxxx> Subject: Re: [xsl] Extracting a XLSX directly using XSLT 3 and Saxon HE [External Email] On 06.07.2022 18:39, Gayanthika Udeshani gudeshani@xxxxxxxxxx wrote: > > > I did some research and I couldn't find whether there is any XSLT 3 > feature in the Saxon HE, which allows extracting the xlsx (Excel file) > directly. I found some solutions using Saxon EE, want to know whether > I have missed something which we can apply using the HE. > On closer look, it seems even Saxon HE can treat an xlsx file as a zip archive if you pass in the right configuration property (example worked for me with Windows Powershell on the command line for Saxon HE 11 and 10) --zipUriPattern:'.*\.xlsx' From there you can use e.g. uri-collection('excel-sheet1.xlsx') to get the URIs of (some or all?) contained files, it appears, and then you can read the XML files with the doc function e.g. doc('jar:file:/C:/SomePath/SomeDir/excel-sheet1.xlsx!/xl/workbook.xml') But as Mike said, the whole structure is rather complicated, with all the references across various files you either need to know your way around Spreadsheet ML or perhaps already have a stylesheet by someone that learned to process such a structure to extract/transform the spreadsheet data.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Extracting a XLSX directl, Martin Honnen martin | Thread | Re: [xsl] Extracting a XLSX directl, Kevin Brown kevin.br |
Re: [xsl] Extracting a XLSX directl, Martin Honnen martin | Date | Re: [xsl] Extracting a XLSX directl, Kevin Brown kevin.br |
Month |