Re: [xsl] From WordprocessingML inline styles to nested inli

Hello all,

several months ago I asked for help with the following task:

Reading up WordprocessingML (from Word 2003), I obtain text runs with inline styles attached as leaf nodes like this:
<w:r>
 <w:rPr>
 <w:i/>
 <w:b/>
 </w:rPr>
 <w:t>This is text in bold and italic.</w:t>
</w:r>
In my output, however, the inline styles should nest, and moreover, nest in a particular order:

<run>This is text in bold and italic.</run>

With the valuable help from the list, especially from David and Wendell, I managed to craft an XSLT 2.0 stylesheet module that does the job quite well; I have attached a demo version of it below (I can post the full, richly commented version if someone is interested), together with a sample input file.

Now I need to enhance it a little bit, in order to cater for cases where some inline style may have different "indicators", which currently interfere with each other. An example would be superscript style which manifests itself in the presence of (at least) one of these 3 child element sequences within w:r:

A) <w:vertAlign w:val="superscript"/>

B) <w:position w:val="6"/>

C) <w:position w:val="6"/><w:vertAlign w:val="superscript"/>

While A) works well, B) and C) receive multiple containers, see the result of applying sample XSL and XML. I know I need to give up considering each child of w:r separately for looking for sequences (or rather sth like unordered node sets?) inside.

So I am facing two questions:

1) Which is the best way to replace the one-by-one comparison in

<xsl:when test="some $style_repr in w:rPr/*
            satisfies
              deep-equal($style_repr, $current_style_wordml_repr)">

with an algorithm that is capable of comparing node sets?

2) I suppose I will have to be able to delete the child elements from w:r which were already matched, to prevent cases A and B above from matching when case C already matched. (Currently, I am doing without because I considered the matching patterns being mutually exclusive.) I think it would be easiest to make the w:r instance, which is now accessed as context node, into a parameter to allow for modifying it. Is there a more elegant way?

Yves

===== wordml_phys_run_styles.xsl =====

<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
  xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml";
  xmlns:lookup="http://xmlns.srz.de/yforkl/xslt/lookup";
  exclude-result-prefixes="lookup w"
  version="2.0">

  <lookup:wordml_styles_table>
    <lookup:wordml_phys_style_repr r_equiv="b">
      <w:b/>
    </lookup:wordml_phys_style_repr>
    <lookup:wordml_phys_style_repr r_equiv="i">
      <w:i/>
    </lookup:wordml_phys_style_repr>
<!-- Uncomment this to try a naive approach to recognize superscript by two
     child elements at once -->
<!--
    <lookup:wordml_phys_style_repr r_equiv="sup">
      <w:position w:val="6"/>
      <w:vertAlign w:val="superscript"/>
    </lookup:wordml_phys_style_repr>
-->
    <lookup:wordml_phys_style_repr r_equiv="sup">
      <w:vertAlign w:val="superscript"/>
    </lookup:wordml_phys_style_repr>
    <lookup:wordml_phys_style_repr r_equiv="sup">
      <w:position w:val="6"/>
    </lookup:wordml_phys_style_repr>
  </lookup:wordml_styles_table>

  <xsl:template match="w:p">
    <sample>
      <xsl:apply-templates/>
    </sample>
  </xsl:template>

  <xsl:template match="w:r">
    <xsl:call-template name="convert_phys_run_styles"/>
  </xsl:template>

<xsl:template name="convert_phys_run_styles"> <xsl:call-template name="add_style">  <xsl:with-param name="available_styles_sequence" select=" document('')/ xsl:stylesheet/ lookup:wordml_styles_table/ lookup:wordml_phys_style_repr"/> </xsl:call-template> </xsl:template>

 <xsl:template name="add_style"> <xsl:param name="available_styles_sequence"/> <xsl:choose> <xsl:when test="empty($available_styles_sequence)">  <xsl:apply-templates select="w:t"/> </xsl:when> <xsl:otherwise> <xsl:variable name="current_style_wordml" select="$available_styles_sequence[1]"/> <xsl:variable name="current_style_wordml_repr" select="$current_style_wordml/*[1]"/> <xsl:choose>   <xsl:when test="some $style_repr in w:rPr/* satisfies deep-equal($style_repr, $current_style_wordml_repr)"> <xsl:element name="{$current_style_wordml/@r_equiv}"> <xsl:call-template name="add_style"> <xsl:with-param name="available_styles_sequence" select="remove($available_styles_sequence, 1)"/> </xsl:call-template> </xsl:element> </xsl:when> <xsl:otherwise> <xsl:call-template name="add_style"> <xsl:with-param name="available_styles_sequence" select="remove($available_styles_sequence, 1)"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:otherwise> </xsl:choose> </xsl:template> </xsl:stylesheet>

===== wordml_phys_run_styles.xml =====

<?xml version="1.0" encoding="ISO-8859-1"?> <w:p xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml";> <w:r> <w:rPr> <w:i/> <w:vertAlign w:val="superscript"/> </w:rPr> <w:t>This italic + superscript is always fine</w:t> </w:r> <w:r> <w:rPr/> <w:t> but </w:t> </w:r> <w:r> <w:rPr> <w:i/> <w:position w:val="6"/> </w:rPr> <w:t>that italic + superscript has maybe one "sup" container too much</w:t> </w:r> <w:r> <w:rPr/> <w:t> while </w:t> </w:r> <w:r> <w:rPr> <w:i/> <w:position w:val="6"/> <w:vertAlign w:val="superscript"/> </w:rPr> <w:t>this italic + superscript has either a double or triple "sup" container!</w:t> </w:r> </w:p>

<- Previous	Index	Next ->
Re: [xsl] question about count(node, Eric Bréchemier	Thread	[xsl] Re: xsl-list Digest 23 Jun 20, Mariecon Saberon
RE: [xsl] Sorting Using A Predefine, Angela Williams	Date	Re: [xsl] question about count(node, Abel Braaksma
	Month

<-prev [Thread] next->	<-prev [Date] next->
Month Index \| List Home

Re: [xsl] From WordprocessingML inline styles to nested inline elements