Re: [xsl] generating Office Open XML parts using xslt

Subject: Re: [xsl] generating Office Open XML parts using xslt
From: "Paul Tyson phtyson@xxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 29 Jul 2014 03:09:25 -0000
On Mon, 2014-07-28 at 13:14 +0000, Wendell Piez wapiez@xxxxxxxxxxxxxxx
wrote:
> Hi,
>
> This is fantastic ... and brings up the related question -- how about
> going the other way, reading data out of XSLX format?
>

There's an Exproc extension step, pxp:unzip, that would give you the
package contents of an office package file (.xlsx, .docx, etc.), and
from there you could read the parts, resolve the relationships you need,
and get to the content, style, and behavioral features you want. That is
not in my line of sight right now, but I think it could be done fairly
easily.

Or, as Jirka mentioned you could use the jar: url scheme to unpack the
file.

The main point is that an xslt+xproc toolchain can be used to read and
write Office Open XML documents. To develop such a toolchain, an
understanding of the OOXML schemas, package structure, and relationship
semantics is needed (sketchily documented in ECMA-376). Beyond that,
some standard mappings from OOXML features to xsl-fo and html5 would be
useful. Xslt libraries could be developed around these mappings.

Regards,
--Paul

> Betty Harvey did some excellent work on this a couple of years ago, as
> documented at
http://www.ibm.com/developerworks/xml/library/x-exceltooasis/index.html?ca=da
t
> -- but I don't know about anything more along these lines. What would
> be fantastic would be a generic utility (XProc would be fine) that
> would expose the data in a spreadsheet as an XML document, more
> compact and legible than its native form -- naturally, for the use of
> XSLT. Whether such a utility could be truly generic ... I'm not sure I
> could say (yet :-).
>
> Any ideas?
>
> Cheers, Wendell
>
>
> On Sat, Jul 26, 2014 at 12:04 AM, Paul Tyson phtyson@xxxxxxxxxxxxx
> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> > I appreciate all the kind interest in this topic. I will set the
> > difficulty warnings aside for now because I believe the potential
> > benefits of this approach are worth some effort.
> >
> > Thanks to Pavel Ptacek's xsl-excel-engine [1], recommended by Vincent
> > Lizzi, I was able to get a leg up on this effort.
> >
> > See below for an Xproc pipeline definition that creates a minimal blank
> > spreadsheet (.xlsx) that can be opened without incident in Excel 2007.
> > It uses the Exproc [2] extension step pxp:zip (as implemented in
> > calabash [3]) to produce the xlsx package.
> >
> > Apologies to those who don't speak xproc, and further apologies since
> > there is not a line of xslt in it. But imagine that any of the
> > input[@port='source']/inline elements in the <pxp:zip> step can be
> > replaced with the results of whatever transformations you like, and this
> > becomes a very powerful tool. In particular, it should be
> > straightforward to transform an fo:table to a workbook sheet.
> >
> > Regards,
> > --Paul
> >
> > [1] https://github.com/foglcz/xsl-excel-engine
> > [2] http://exproc.org/
> > [3] http://xmlcalabash.com/
> >
> > Run with calabash like:
> >
> >> java -jar calabash.jar ooxml.xpl
> > output-file=file://path/to/output.xlsx
> >
> > ============== ooxml.xpl ===============
> > <declare-step name="ooxml-proc"
> >               xmlns="http://www.w3.org/ns/xproc";
> >               xmlns:pxp="http://exproc.org/proposed/steps";
> >               version="1.0">
> >
> >   <option name="output-file"/>
> >
> >   <output port="result" primary="true"/>
> >   <serialization port="result" indent="true"/>
> >
> >   <import
> > href="http://xmlcalabash.com/extension/steps/library-1.0.xpl"/>
> >
> >   <pxp:zip command="create">
> >     <with-option name="href"
> >                  select="$output-file"/>
> >     <input port="source">
> >       <inline xml:base="http://example.org/sheet1";>
> >         <worksheet
> >
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main";>
> >           <sheetData />
> >         </worksheet>
> >       </inline>
> >       <inline xml:base="http://example.org/workbook";>
> >         <workbook
> >
> >
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships";
> >
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main";>
> >           <sheets>
> >             <sheet name="Sheet1" sheetId="1" r:id="rId1"/>
> >           </sheets>
> >         </workbook>
> >       </inline>
> >       <inline xml:base="http://example.org/content-types";>
> >         <Types
> > xmlns="http://schemas.openxmlformats.org/package/2006/content-types";>
> >           <Default Extension="rels"
> > ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
> >           <Default Extension="xml" ContentType="application/xml"/>
> >           <Override
> >               PartName="/workbook.xml"
> >
> >
ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.shee
t.main+xml"/>
> >           <Override
> >               PartName="/sheet1.xml"
> >
> >
ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.work
sheet+xml"/>
> >         </Types>
> >       </inline>
> >       <inline xml:base="http://example.org/package-rels";>
> >         <Relationships
> > xmlns="http://schemas.openxmlformats.org/package/2006/relationships";>
> >           <Relationship
> >               Id="rId1"
> >
> >
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/off
iceDocument"
> >               Target="workbook.xml"/>
> >         </Relationships>
> >       </inline>
> >       <inline xml:base="http://example.org/workbook-rels";>
> >         <Relationships
> > xmlns="http://schemas.openxmlformats.org/package/2006/relationships";>
> >           <Relationship
> >               Id="rId1"
> >
> >
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/wor
ksheet"
> >               Target="sheet1.xml"/>
> >         </Relationships>
> >       </inline>
> >     </input>
> >     <input port="manifest">
> >       <inline>
> >         <zip-manifest xmlns="http://www.w3.org/ns/xproc-step";>
> >           <entry name="sheet1.xml" href="http://example.org/sheet1"/>
> >           <entry name="workbook.xml" href="http://example.org/workbook"/>
> >           <entry name="[Content_Types].xml"
> > href="http://example.org/content-types"/>
> >           <entry name="_rels/.rels"
href="http://example.org/package-rels"/>
> >           <entry name="_rels/workbook.xml.rels"
> > href="http://example.org/workbook-rels"/>
> >         </zip-manifest>
> >       </inline>
> >     </input>
> >   </pxp:zip>
> >
> > </declare-step>
> > ================== end of file ===============
> >
> >
> >
> > On Fri, 2014-07-25 at 13:51 +0000, Eliot Kimber ekimber@xxxxxxxxxxxx
> > wrote:
> >> If you can use Java, the Apache POI library makes reading and writing MS
> >> Office formats about as easy as it can be. I've successfully used the
> >> libraries to read and write non-trivial Excel spreadsheets and also to
> >> generate Powerpoint slides (although the current support for Powerpoint
is
> >> less complete than for Excel). It would be a significant development
> >> effort to build equivalent infrastructure in XSLT and I would be loath
to
> >> take it on.
> >>
> >> For powerpoint generation I used the technique of using an intermediate
> >> XML format that abstracts slides and then Java code that transforms that
> >> XML into Powerpoint using the POI library. That makes it possible to use
> >> XSLT to generate the slide content but avoids having to deal with the
> >> Office Open complexity in XSLT. The code is the The Slidinator project
on
> >> Github (https://github.com/drmacro/slidinator).
> >>
> >> Cheers,
> >>
> >> E.
> >> bbbbb
> >> Eliot Kimber, Owner
> >> Contrext, LLC
> >> http://contrext.com
> >>
> >>
> >>
> >>
> >> On 7/25/14, 7:28 AM, "Nicolas BUONOMO nicolas.buonomo@xxxxxxxxxxx"
> >> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> >>
> >> >Hi,
> >> >
> >> >You can also generating Open Document spreedsheets (xls) and then
convert
> >> >them to xlsx document with
> >> >libreoffice for exemple.
> >> >I think that generating Open Document Format is more easy than
generating
> >> >Office Open XML. I have do it in
> >> >some simple case.
> >> >
> >> >Nicolas
> >> >
> >> >On 25/07/2014 03:02, Paul Tyson phtyson@xxxxxxxxxxxxx wrote:
> >> >> Does anyone know of success stories in generating Office Open XML
> >> >> artifacts using XSLT?
> >> >>
> >> >> (This is the "open" format used by Microsoft Office since 2007,
> >> >> standardized in ISO/IEC 29500 and ECMA-376.)
> >> >>
> >> >> I am looking for pointers to solid documentation of namespaces and
> >> >> package structure, particularly for SpreadsheetML (Microsoft's .xlsx
> >> >> format) as used in Excel 2007.
> >> >>
> >> >> Thanks in advance,
> >> >> --Paul
> >> >>
> >> >>
> >> >
> >> >--
> >> >Nicolas BUONOMO
> >> >CNAF - CNEDI RhC4ne Alpes
> >> >DSI - Direction Fonctionnelle et Processus
> >> >Relation avec les CollectivitC)s Territoriales
> >> >TC)l : 0478636669 / 0677877811
> >> >MC)l : nicolas.buonomo@xxxxxxxxxxx

Current Thread