Subject: Re: [xsl] mystery #3: rendering embedded HTML From: Gary Lawrence Murphy <garym@xxxxxxxxxx> Date: 13 Apr 2002 12:26:37 -0400 |
>>>>> "J" == Jeni Tennison <jeni@xxxxxxxxxxxxxxxx> writes: J> You can use disable-output-escaping in this situation. Not quite. doe works for inline literal markup chars: J> <envelope> <![CDATA[ <p>My mal-formed HTML.<br> ]]> </envelope> My situation is the inverse of doe. What I have is <envelope><p>My mal-formed HTML escaped.<br></envelope> for which there is no way to extract and _evaluate_ this back into <p>My mal-formed HTML escaped<br> The reason I have this the other way around is because, when you take <envelope> <![CDATA[ <p>My mal-formed HTML.<br> ]]> </envelope> and pass it through a parser (in our case, into an XML transform from one DTD to another via a different XSL process), CDATA is just a pre-processor directive that tells the parser to escape any invalid chars. Thus, once stored, your example is physically recorded as <envelope><p>My mal-formed HTML escaped.<br></envelope> for which there is apparently no way to extract it again using XSL. J> If this HTML makes up the majority of your page, the other J> option is to use the text output method rather than the XML J> output method: J> <xsl:output method="text" /> Again, this is if the XML data contains invalid invalid chars; it doesn't, it contains _escaped_ chars which need to be resolved back into invalid chars. It needs an entity resolver. J> But the best solution is nevertheless to tidy up the HTML so J> that it's well-formed. In our specific case, we don't own the source of the HTML, it comes from thousands of journalists working for countless independent news agencies scattered around the world. Even in the general case, I still don't think we should impose techno-formalities like strict XHTML-compliance on non-professionals unless we want them to eschew our application ;) Technology should serve the body, not enslave the mind. As a pure aside in usability constraints, you should have been there when I first tried to get journalists using _basic_ markup tags like <em> -- not everyone is super-keen to learn markup protocols --- if I forced them into an app that would reject their input until all tags within the <div> were legal to the DTD, I'd never see another news item submitted, and as soon as their managers learned from some part-time teen geek that the HTML code my program was dutifully rejecting "works perfectly in MSIE", I'd likely never see another contract in that industry :) -- Gary Lawrence Murphy <garym@xxxxxxxxxxx> TeleDynamics Communications Inc Business Innovations Through Open Source Systems: http://www.teledyn.com "Computers are useless. They can only give you answers."(Pablo Picasso) XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] mystery #3: rendering emb, Jeni Tennison | Thread | RE: [xsl] mystery #3: rendering emb, Julian Reschke |
Re: [xsl] mystery #2: testing docum, Gary Lawrence Murphy | Date | RE: [xsl] mystery #3: rendering emb, Julian Reschke |
Month |