AW: AW: [xsl] Detecting carriage return and newline feed in XML Data

Subject: AW: AW: [xsl] Detecting carriage return and newline feed in XML Data
From: <michella@xxxxxxx>
Date: Mon, 1 Nov 2004 11:49:03 +0100
> XML input is processed by the XML parser before it gets anywhere near the
> XSLT processor. The only way to prevent XML's normalization of whitespace
> characters (whether in element or attribute content) is to write the
> characters as character references, e.g. &#x0D; You can of course do that
by
> preprocessing the file in some non-XML-aware tool before submitting it to
> the XML parser.
>
> Are you really sure you need to do this? Somehow, you're not using XML the
> way it was intended to be used and that's always bad news. I've forgotten
> what your original problem was, if you ever explained it.
>
> Michael Kay


Ok, let me explain the whole problematic :

1. The XML Document is generated by System Architect (Popkin Software). This
software is intended to help build EAI (Enterprise Application Integration).
2. Each diagram, such as each symbol it contains have their own user defined
properties. One of them is a free text field (here SAProperty/@SAPrpValue)
which we use to freely describe the property of his respectiv symbol.
3. The text inside is divided by a number of paragraphs (who are commonly
separated through carriage return and new line feed).
4. The System Architect cleverly export all structured diagrams and their
properties into one single XML. The text field described before is as well
stored as an attribut of an XML element. Below a (tiny) part of the overhall
60MB XML Document :

<?xml version="1.0" encoding="UTF-16" ?>
<Classes>
	<Class>
		<SADefinition SAObjId="_2753" SAObjName="app_HybridPost"
SAObjMinorTypeName="Application" 			SAObjMinorTypeNum="309"
SAObjMajorTypeNum="3" SAObjAuditId="MiL" 						SAObjUpdateDate="25.08.2004"
SAObjUpdateTime="09:20:26">
		<SAProperty SAPrpName="Description"
		SAPrpValue="Mit der Anwendung HybridPost wird die bestehende Infrastruktur
von Postfinance f|r den 			Druck und die Verpackung von Kundendokumenten von
Drittkunden im Printcenter Z|rich genutzt.
		Das Projekt &quot;Strategie HybridPost&quot;, das sich zur Zeit in der
Voranalyse-Phase befindet, hat zum 		Ziel, die HybridPost-Lvsung
weiterzuentwickeln und zusdtzliche Komponenten wie Archivierung, Billing,
Druck, Verpackung und Call-Center in die bestehende Lvsung zu integrieren.
		Die Plattform HypoShare wird als Teil des Anwendungssystems HybridPost
modelliert." SAPrpEditType="1" 		SAPrpLength="4074"/>
		<SAProperty SAPrpName="GUID"
SAPrpValue="b1318511-4b95-11d6-8062-00c09f0645a1" 				SAPrpEditType="1"
SAPrpLength="64"/>
		...
		BLABLABLA....
		...
	</Class>
</Classes>

5. You'll see that after the word "genutzt." and "integrieren", there is a
carriage return (assuming that your browser handles it)
6. I need to have it in my FOP processed PDF document printed without loosing
the paragraphs.

I hope it will help ;-)

Cheers

Lawrence

Current Thread