Re: [xsl] Significant whitespace in attribute values

Subject: Re: [xsl] Significant whitespace in attribute values
From: Lars Huttar <lars_huttar@xxxxxxx>
Date: Thu, 15 Jul 2010 09:41:17 -0500
On 7/15/2010 5:47 AM, Michael M|ller-Hillebrand wrote:
> Hello friends,
>
> I maybe tasked with XSL-transforming some XML into a publishable version, but I shuddered as soon as I saw the input (coming from a custom-built Web CMS). It is something like the following with significant line feeds in attribute values:
>
> <items>
>  <item name="address" data="
> Company Name
> Street Address
> ZIP City
> Country
> " />
> </items>
>
> I have seen and dealt with line feeds in element content, but this time an alarm clock rang in my head. Am I right in my interpretation of the XML standard that attribute content must be normalized by a conforming XML parser and therefore it would never be possible to write an XSL to locate line feeds in attribute values?
>
> http://www.w3.org/TR/REC-xml/#AVNormalize
>
> This would give me a strong and invincible reason to tell the CMS programmers to change their stuff.
>
> - Michael
>
> --
>   

As Martin said, you would have to have the line feeds encoded as
character references, like &#10;
and normally, an XML serializer (such as the one used by the Web CMS)
would not give you control over how line feeds are serialized (AFAIK).

One solution would be to put a non-XML tool in between the CMS and your
XSLT, to convert the line feeds to &#xA;
I don't think it would even need to parse the XML; you could just
replace every line feed character. So you could use sed.
(Somebody correct me if I'm wrong...)

But the burden for doing that should be on the Web CMS programmers, if
they're supposed to be feeding you XML that contains the required
information.
Then they can choose whether they want to do that, or output proper XML
in the first place.

Lars

Current Thread