Re: [xsl] generating Office Open XML parts using xslt

Subject: Re: [xsl] generating Office Open XML parts using xslt
From: "Wendell Piez wapiez@xxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 30 Jul 2014 13:43:22 -0000
Jirka and Paul, thanks: this is useful stuff.

I'll probably try the XProc-based approach using the unzip extension
... my guess is that it won't be difficult to make it work, while it
may be more work to make it work dependably in all reasonable cases
... the usual story.

In any case it's great to see the tools are getting strong enough to
give us confidence with this stuff.

Cheers, Wendell



On Mon, Jul 28, 2014 at 11:10 PM, Paul Tyson phtyson@xxxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, 2014-07-28 at 13:14 +0000, Wendell Piez wapiez@xxxxxxxxxxxxxxx
> wrote:
>> Hi,
>>
>> This is fantastic ... and brings up the related question -- how about
>> going the other way, reading data out of XSLX format?
>>
>
> There's an Exproc extension step, pxp:unzip, that would give you the
> package contents of an office package file (.xlsx, .docx, etc.), and
> from there you could read the parts, resolve the relationships you need,
> and get to the content, style, and behavioral features you want. That is
> not in my line of sight right now, but I think it could be done fairly
> easily.
>
> Or, as Jirka mentioned you could use the jar: url scheme to unpack the
> file.
>
> The main point is that an xslt+xproc toolchain can be used to read and
> write Office Open XML documents. To develop such a toolchain, an
> understanding of the OOXML schemas, package structure, and relationship
> semantics is needed (sketchily documented in ECMA-376). Beyond that,
> some standard mappings from OOXML features to xsl-fo and html5 would be
> useful. Xslt libraries could be developed around these mappings.
>
> Regards,
> --Paul
>
>> Betty Harvey did some excellent work on this a couple of years ago, as
>> documented at
http://www.ibm.com/developerworks/xml/library/x-exceltooasis/index.html?ca=da
t
>> -- but I don't know about anything more along these lines. What would
>> be fantastic would be a generic utility (XProc would be fine) that
>> would expose the data in a spreadsheet as an XML document, more
>> compact and legible than its native form -- naturally, for the use of
>> XSLT. Whether such a utility could be truly generic ... I'm not sure I
>> could say (yet :-).
>>
>> Any ideas?
>>
>> Cheers, Wendell
>>
>>
>> On Sat, Jul 26, 2014 at 12:04 AM, Paul Tyson phtyson@xxxxxxxxxxxxx
>> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>> > I appreciate all the kind interest in this topic. I will set the
>> > difficulty warnings aside for now because I believe the potential
>> > benefits of this approach are worth some effort.
>> >
>> > Thanks to Pavel Ptacek's xsl-excel-engine [1], recommended by Vincent
>> > Lizzi, I was able to get a leg up on this effort.
>> >
>> > See below for an Xproc pipeline definition that creates a minimal blank
>> > spreadsheet (.xlsx) that can be opened without incident in Excel 2007.
>> > It uses the Exproc [2] extension step pxp:zip (as implemented in
>> > calabash [3]) to produce the xlsx package.
>> >
>> > Apologies to those who don't speak xproc, and further apologies since
>> > there is not a line of xslt in it. But imagine that any of the
>> > input[@port='source']/inline elements in the <pxp:zip> step can be
>> > replaced with the results of whatever transformations you like, and this
>> > becomes a very powerful tool. In particular, it should be
>> > straightforward to transform an fo:table to a workbook sheet.
>> >
>> > Regards,
>> > --Paul
>> >
>> > [1] https://github.com/foglcz/xsl-excel-engine
>> > [2] http://exproc.org/
>> > [3] http://xmlcalabash.com/
>> >
>> > Run with calabash like:
>> >
>> >> java -jar calabash.jar ooxml.xpl
>> > output-file=file://path/to/output.xlsx
>> >
>> > ============== ooxml.xpl ===============
>> > <declare-step name="ooxml-proc"
>> >               xmlns="http://www.w3.org/ns/xproc";
>> >               xmlns:pxp="http://exproc.org/proposed/steps";
>> >               version="1.0">
>> >
>> >   <option name="output-file"/>
>> >
>> >   <output port="result" primary="true"/>
>> >   <serialization port="result" indent="true"/>
>> >
>> >   <import
>> > href="http://xmlcalabash.com/extension/steps/library-1.0.xpl"/>
>> >
>> >   <pxp:zip command="create">
>> >     <with-option name="href"
>> >                  select="$output-file"/>
>> >     <input port="source">
>> >       <inline xml:base="http://example.org/sheet1";>
>> >         <worksheet
>> >
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main";>
>> >           <sheetData />
>> >         </worksheet>
>> >       </inline>
>> >       <inline xml:base="http://example.org/workbook";>
>> >         <workbook
>> >
>> >
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships";
>> >
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main";>
>> >           <sheets>
>> >             <sheet name="Sheet1" sheetId="1" r:id="rId1"/>
>> >           </sheets>
>> >         </workbook>
>> >       </inline>
>> >       <inline xml:base="http://example.org/content-types";>
>> >         <Types
>> > xmlns="http://schemas.openxmlformats.org/package/2006/content-types";>
>> >           <Default Extension="rels"
>> > ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
>> >           <Default Extension="xml" ContentType="application/xml"/>
>> >           <Override
>> >               PartName="/workbook.xml"
>> >
>> >
ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.shee
t.main+xml"/>
>> >           <Override
>> >               PartName="/sheet1.xml"
>> >
>> >
ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.work
sheet+xml"/>
>> >         </Types>
>> >       </inline>
>> >       <inline xml:base="http://example.org/package-rels";>
>> >         <Relationships
>> > xmlns="http://schemas.openxmlformats.org/package/2006/relationships";>
>> >           <Relationship
>> >               Id="rId1"
>> >
>> >
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/off
iceDocument"
>> >               Target="workbook.xml"/>
>> >         </Relationships>
>> >       </inline>
>> >       <inline xml:base="http://example.org/workbook-rels";>
>> >         <Relationships
>> > xmlns="http://schemas.openxmlformats.org/package/2006/relationships";>
>> >           <Relationship
>> >               Id="rId1"
>> >
>> >
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/wor
ksheet"
>> >               Target="sheet1.xml"/>
>> >         </Relationships>
>> >       </inline>
>> >     </input>
>> >     <input port="manifest">
>> >       <inline>
>> >         <zip-manifest xmlns="http://www.w3.org/ns/xproc-step";>
>> >           <entry name="sheet1.xml" href="http://example.org/sheet1"/>
>> >           <entry name="workbook.xml"
href="http://example.org/workbook"/>
>> >           <entry name="[Content_Types].xml"
>> > href="http://example.org/content-types"/>
>> >           <entry name="_rels/.rels"
href="http://example.org/package-rels"/>
>> >           <entry name="_rels/workbook.xml.rels"
>> > href="http://example.org/workbook-rels"/>
>> >         </zip-manifest>
>> >       </inline>
>> >     </input>
>> >   </pxp:zip>
>> >
>> > </declare-step>
>> > ================== end of file ===============
>> >
>> >
>> >
>> > On Fri, 2014-07-25 at 13:51 +0000, Eliot Kimber ekimber@xxxxxxxxxxxx
>> > wrote:
>> >> If you can use Java, the Apache POI library makes reading and writing
MS
>> >> Office formats about as easy as it can be. I've successfully used the
>> >> libraries to read and write non-trivial Excel spreadsheets and also to
>> >> generate Powerpoint slides (although the current support for Powerpoint
is
>> >> less complete than for Excel). It would be a significant development
>> >> effort to build equivalent infrastructure in XSLT and I would be loath
to
>> >> take it on.
>> >>
>> >> For powerpoint generation I used the technique of using an intermediate
>> >> XML format that abstracts slides and then Java code that transforms
that
>> >> XML into Powerpoint using the POI library. That makes it possible to
use
>> >> XSLT to generate the slide content but avoids having to deal with the
>> >> Office Open complexity in XSLT. The code is the The Slidinator project
on
>> >> Github (https://github.com/drmacro/slidinator).
>> >>
>> >> Cheers,
>> >>
>> >> E.
>> >> bbbbb
>> >> Eliot Kimber, Owner
>> >> Contrext, LLC
>> >> http://contrext.com
>> >>
>> >>
>> >>
>> >>
>> >> On 7/25/14, 7:28 AM, "Nicolas BUONOMO nicolas.buonomo@xxxxxxxxxxx"
>> >> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>> >>
>> >> >Hi,
>> >> >
>> >> >You can also generating Open Document spreedsheets (xls) and then
convert
>> >> >them to xlsx document with
>> >> >libreoffice for exemple.
>> >> >I think that generating Open Document Format is more easy than
generating
>> >> >Office Open XML. I have do it in
>> >> >some simple case.
>> >> >
>> >> >Nicolas
>> >> >
>> >> >On 25/07/2014 03:02, Paul Tyson phtyson@xxxxxxxxxxxxx wrote:
>> >> >> Does anyone know of success stories in generating Office Open XML
>> >> >> artifacts using XSLT?
>> >> >>
>> >> >> (This is the "open" format used by Microsoft Office since 2007,
>> >> >> standardized in ISO/IEC 29500 and ECMA-376.)
>> >> >>
>> >> >> I am looking for pointers to solid documentation of namespaces and
>> >> >> package structure, particularly for SpreadsheetML (Microsoft's .xlsx
>> >> >> format) as used in Excel 2007.
>> >> >>
>> >> >> Thanks in advance,
>> >> >> --Paul
>> >> >>
>> >> >>
>> >> >
>> >> >--
>> >> >Nicolas BUONOMO
>> >> >CNAF - CNEDI RhC4ne Alpes
>> >> >DSI - Direction Fonctionnelle et Processus
>> >> >Relation avec les CollectivitC)s Territoriales
>> >> >TC)l : 0478636669 / 0677877811
>> >> >MC)l : nicolas.buonomo@xxxxxxxxxxx
>> >> >
>> >> >
>> >>
>> >
>>
>>
>>
>



--
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^

Current Thread