RE: [xsl] Grouping Word 2007 content by customXml nodes

Subject: RE: [xsl] Grouping Word 2007 content by customXml nodes
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Mon, 15 Jan 2007 11:55:30 -0000
In this expression:

<xsl:value-of select="//w:p/w:r/w:t"/>

"//" selects from the root of the document. You want to select relative to
what's selected by xsl:for-each-group, that is current-group(). So try:

<xsl:value-of select="current-group()/w:p/w:r/w:t"/>

In practice you probably want to do further processing of this content,
something like

<xsl:apply-templates select="current-group()/w:p"/> 

Michael Kay
http://www.saxonica.com/



> -----Original Message-----
> From: Frank Hopper [mailto:frank.hopper@xxxxxx] 
> Sent: 15 January 2007 11:05
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] Grouping Word 2007 content by customXml nodes
> 
> I am new to XSLT and working with ASP.NET 2.0 trying to bulk 
> upload content from Word 2007 docx files to a SQL Server 2005 
> Express Edition database in order to publish the content 
> through my content management system.  So far I think I will 
> need to use xml version 2.0 and Saxon 8.7 processor for .NET 
> (since the .NET XslCompiledTransform processor only supports 
> xml version 1.0).
> 
> I would like to split the Word 2007 documents into several 
> parts via XSLT so I can publish a long Word 2007 document as 
> several web pages to the internet. I added my own customXML 
> to  the Word 2007 document  to insert information like page 
> title, url, meta description and meta keywords and so on (the 
> WORD2007SAMPLE_DOCUMENT.XML file below only shows the page 
> title customXML to keep the sample short). Every <w:customXml 
> w:element="pageTitle"> indicates the start of a new web page. 
> The content in between will be converted to HTML.
> 
> The DESIRED_OUTPUT.XML shows the xml file I would like to get 
> as a result.  This file will be loaded into the corresponding 
> tables and columns of my SQL Server 2005 Express Edition database.
> 
> The RECEIVED_OUTPUT.XML shows the output I get so far. It 
> shows that the content is not grouped correctly into separate 
> web pages.
> 
> The MY_NOT_WORKING_TRANSFORM.XSL shows how I tried to 
> transform the WORD2007SAMPLE_DOCUMENT.XML into 
> DESIRED_OUTPUT.XML without success. The conversion of the 
> content to HTML is not included to keep the sample short.
> 
> MY PROBLEM:
> When I group by  <w:customXml w:element="pageTitle"> using 
> for-each-group I can't get to the value of  <w:t>Content 
> ?</w:t> nodes without destroying my grouping effort.  I 
> suppose this is because the content is not in the same or a 
> lower level than my <w:customXml w:element="pageTitle">.
> 
> Thanks for your help.
> 
> ----------------------------
> WORD2007SAMPLE_DOCUMENT.XML
> ----------------------------
> <?xml version="1.0"?>
> <w:document
> xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/20
> 06/main">
>    <w:body>
>      <w:p>
>        <w:customXml w:element="pageTitle">
>          <w:r>
>            <w:t>1. Web Page Title</w:t>
>          </w:r>
>        </w:customXml>
>      </w:p>
>      <w:p>
>        <w:r>
>          <w:t>Content A</w:t>
>        </w:r>
>      </w:p>
>      <w:p>
>        <w:r>
>          <w:t>Content B</w:t>
>        </w:r>
>      </w:p>
>      <w:p>
>        <w:customXml w:element="pageTitle">
>          <w:r>
>            <w:t>2. Web Page Title</w:t>
>          </w:r>
>        </w:customXml>
>      </w:p>
>      <w:p>
>        <w:r>
>          <w:t>Content C</w:t>
>        </w:r>
>      </w:p>
>      <w:p>
>        <w:r>
>          <w:t>Content D</w:t>
>        </w:r>
>      </w:p>
>    </w:body>
> </w:document>
> 
> ----------------------------
> DESIRED_OUTPUT.XML
> ----------------------------
> <?xml version="1.0" encoding="utf-8"?>
> <root
> xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/20
> 06/main">
>    <pageData>
>      <pageTitle>1. Web Page Title</pageTitle>
>      <pageContent>
>        Content A and Content B
>      </pageContent>
>    </pageData>
>    <pageData>
>      <pageTitle>2. Web Page Title</pageTitle>
>      <pageContent>
>        Content C and Content D
>       </pageContent>
>    </pageData>
> </root>
> 
> ----------------------------
> RECEIVED_OUTPUT.XML
> ----------------------------
> <?xml version="1.0" encoding="utf-8"?>
> <root
> xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/20
> 06/main">
>    <pageData>
>      <pageTitle>1. Web Page Title</pageTitle>
>      <pageContent>
>        Content A and Content B Content C and Content D
>      </pageContent>
>    </pageData>
>    <pageData>
>      <pageTitle>2. Web Page Title</pageTitle>
>      <pageContent>
>        Content A and Content B Content C and Content D
>      </pageContent>
>    </pageData>
> </root>
> 
> ----------------------------
> MY_NOT_WORKING_TRANSFORM.XSL
> ----------------------------
> <xsl:stylesheet version="2.0"
>    xmlns:xsl=http://www.w3.org/1999/XSL/Transform
>   
> xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/20
> 06/main">
> 
>    <xsl:output method="xml" indent="yes" encoding="utf-8" />
>    <xsl:strip-space elements="*"/>
> 
>    <xsl:template match="/">
>      <xsl:apply-templates select="//w:body"/>
>    </xsl:template>
> 
>    <xsl:template match="w:body">
>      <root>
>        <xsl:for-each-group select="*"
>         group-starting-with="w:p[w:customXml/@w:element = 
> 'pageTitle']">
>          <pageData>
>            <pageTitle>
>              <xsl:value-of select="."/>
>            </pageTitle>
>            <pageContent>
>              <xsl:value-of select="//w:p/w:r/w:t"/>
>            </pageContent>
>          </pageData>
>        </xsl:for-each-group>
>      </root>
>    </xsl:template>
> </xsl:stylesheet>

Current Thread