Re: [xsl] how to translate XML with XHTML-formatted element to FO

Subject: Re: [xsl] how to translate XML with XHTML-formatted element to FO
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Tue, 12 Apr 2005 17:19:11 -0400
Maik,

Your inline "HTML" has been escaped, presumably to prevent it from being parsed (and with that effect in any case).

If it's not parsed, it's not available to your XSLT processor to do anything with. All the processor sees are strings, and you can't match substrings with templates (nor is it clear what the point of that would be, as strings are simply sequences of characters, and have no node structure -- they're strings).

In order to "un-escape it" (that is, parse it as if it were truly the markup it purports to represent), you have a couple of options:

1. Preprocess your input so that this text is not escaped, making it available as markup to parse. Where the input of such a process is

<TITLE>&lt;b&gt;Homer Simpson For President&lt;/b&gt;</TITLE>

the output would be

<TITLE><b>Homer Simpson For President</b></TITLE>

This can be parsed and processed as normally.

An XSLT engine connected to a serializer, with support for the disable-output-escaping feature, could be used to do this. Note that it can't fix broken pseudo-markup however, so YMMV.

2. Use an extension function to parse the text node and return the results as if it had been markup all along. For this, you need a processor that provides such a function, such as Saxon. It can't be done in standard XSLT.

Good luck,
Wendell

At 02:12 PM 4/12/2005, you wrote:
how can I translate XML files (like the following), which contain
XHTML-formatted text in one element, to PDF using Apache FOP ?

 --- books.xml --->
 <BOOK>
   <AUTHOR>Walt Disney</AUTHOR>
   <PRICE>19.90</PRICE>
   <TITLE>&lt;b&gt;Donald Duck - &lt;i&gt;The True Story&lt;/i&gt;
&lt;/b&gt;</TITLE>
 </BOOK>

 <BOOK>
   <AUTHOR>Matt Groening</AUTHOR>
   <PRICE>25.00</PRICE>
   <TITLE>&lt;b&gt;Homer Simpson For President&lt;/b&gt;</TITLE>
 </BOOK>
   ...
 <--- books.xml ---

I mean how should an appropriate XSL stylesheet be designed best to handle
straight forward FO formatting for <AUTHOR> and <PRICE> as well as for
additional transformation of XHTML-formatted <TITLE> ?

I just need some general ideas on how I can process XML files with
sub-structured (XHTML-formatted) elements by one (or more) XSL
stylesheet(s).

My problem is not to translate XHTML-formatted text to FO resp. PDF (for
this purpose I already have a working "xhtml-to-fo.xsl" stylesheet).
Rather I have no idea at the moment on how to integrate the existing
xhtml-to-fo.xsl code and templates into the main XSL stylesheet for
generating the FO output.

Currently I'm only able to translate content of <TITLE> element as plain
text to FO output, i.e. FO output for <TITLE> shows the tags <b>...</b> and
<i>...</i> as is, without applied bold and italic formatting on title text.

Any hints, tips or code fragments would be very much appreciated!


======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

Current Thread