[xsl] Newbie: How to flatten nested emphasis elements? - Interesting I think :)

Subject: [xsl] Newbie: How to flatten nested emphasis elements? - Interesting I think :)
From: Bindu Wavell <mulberry.19.72@xxxxxxxxxxxx>
Date: Sat, 20 Nov 2004 17:15:06 -0700
I'm working with a DTD that uses the <emphasis> tag to mark text up as bold, italic, 
boxed, etc. I'm trying to build an XSLT to transform this so that it can be consumed
by Adobe InDesign which unfortunately does not appear to handle nested in-line styles. 

Could someone suggest how I could do this?

If it helps here is a sample (contrived) input:

1:  <document>
2:  <body>
3:  <title>Jobs prayer</title>
4:  <para>Our <emphasis emphasis-style="underline">program</emphasis> who art in 
    <emphasis emphasis-style="bold">memory</emphasis>, 
    <emphasis emphasis-style="italic">hello</emphasis> be 
    <emphasis emphasis-style="bold"><emphasis emphasis-style="italic">thy
    </emphasis></emphasis> name,</para>
5:  <para>
    <emphasis emphasis-style="bold"><emphasis emphasis-style="italic">thy</emphasis>
    <emphasis emphasis-style="boxed">opperating system</emphasis> come,</emphasis></para>
6:  <para><ul><li><emphasis emphasis-style="bold"><emphasis emphasis-style="italic">thy</emphasis>
    commands be done,</emphasis></li><li><emphasis emphasis-style="bold">
    <emphasis emphasis-style="italic">thy</emphasis> commands be done,</emphasis></li></ul></para>
7:  <para><emphasis emphasis-style="italic">at the <emphasis emphasis-style="bold">printer
    <emphasis emphasis-style="italic">as it is</emphasis> on the </emphasis>
    <!-- isn&apos;t this poetic -->screen (<year>1972</year>)</emphasis></para>
8:  <para><emphasis emphasis-style="bold"><year certified="true">1922</year><emphasis></para>
9:  </body>
10: </document>

and a sample (contrived) output:
 
1:  <document>
2:  <body>
3:  <title>Jobs prayer</title>
4:  <p>Our <u>program</u> who art in <b>memory</b>, <i>hello</i> be <bi>thy</bi> name,</p>
5:  <p><bi>thy</bi><bx> opperating system</bx><b> come,</b></p>
6:  <p><ul><li><bi>thy</bi><b> commands be done,</b></li><li><bi>thy</bi><b> commands be 
    done,</b></li></ul></p>
7:  <p><i>at the </i><bi>printer as it is on the </bi><i><!-- isn&apos;t this poetic -->screen
    (</i><iyear>1972</iyear><i>)</i></p>
8:  <p><byear certified="true">1922</byear></p>
9:  </body>
10: </document>

I'm not at all set on using the short forms <p>,<b>,<u>,<i>,<bi>,<bx>,etc in fact it would be nice
and possibly easier to use the emphasis-style attribute (possibly agregated). <para>,<bold>,
<underlined>,<italic>,<bold-italic>,<bold-boxed>,etc.

Some interesting things: 
o It appears to be important to keep track of open emphasis history. 
  o When a nested em tag is closed it is important to re-open the remaining styles
  o No need to create duplicate tags if we open/close multiple styles/elements in 
    succession. For example: <bi> on 6 and <byear> on 9
o It is desireable to keep comments
o <bold-italic> is the same as <italic-bold> so they should be represented with a single element
o It is necessary that we keep embedded elements (<year> for example)
o Embedded elements need to be flattened <iyear>1972</iyear> instead of <year> inside of
  an <i> tag, as InDesign appears to suck at handling nested elements in a text stream... 
o Embedded elements need to keep their attributes
o I'd like to keep ENTITIES from getting transformed, I think this is implementation 
  specific and not supported by the XSLT spec, but just in case I included this example.

It looks to me like some of this is MUCH easier using XSLT 2.0, is that a reasonable 
assesment? Is XSLT 2.0 ready to run in a production environment, or is that still on 
it's first legs?

Any thoughts, suggestions, comments, questions would be greatly appreciated!



-- Bindu

Current Thread