I am working on generating Adobe InCopy article (INCX) files from DITA
source. The challenge I face is that the DITA source is typical
documentation XML, where you have mixed content with embedded inline
elements that may be nested to any depth, e.g.:
<p>Some text <i>italic text <b>now bold italic</b> back to italic</i>
more text</p>
In the INCX representation of this, each text string with distinct
formatting is separately wrapped as a "text run", making the above into:
<txsr><pcnt>Some text </pcnt></txsr>
<txsr><pcnt>italic text </pcnt></txsr>
<txsr><pcnt>now bold italic</pcnt></txsr>
<txsr><pcnt> back to italic</pcnt></txsr>
<txsr><pcnt> more text& #x0a;</pcnt></txsr>
(INCX details omitted for simplicity)
An INCX file is essentially just a long sequence of txsr elements. There
is no structural nesting in the InCopy data--newlines are the only
structural markers (newlines signal paragraph breaks, so all input
newlines have to be normalized to whitespace and newlines emitted only
at the ends of visual blocks).
There is no require that the data be normalized so as to produce the
smallest number of text runs to achieve the the formatting result but I
do have to correlate specific input element types to the appropriate
character and paragraph styles for each text run (not shown in the
example above, but each txsr points to the character and paragraph style
that determines its formatting).
The only obvious solution I can think of for this using XSLT 2 is to do
two passes:
1. Generate an intermediate data set where blocks are wrapped and each
font change is indicated by an empty marker element.
2. Use for-each-group to translate the text with markers into text runs.
Is there a simpler or more elegant solution I'm missing?
Thanks,
Eliot
--
Eliot Kimber
Senior Solutions Architect
"Bringing Strategy, Content, and Technology Together"
Main: 610.631.6770
www.reallysi.com
www.rsuitecms.com