Re: [xsl] mystery #3: rendering embedded HTML

Subject: Re: [xsl] mystery #3: rendering embedded HTML
From: Gary Lawrence Murphy <garym@xxxxxxxxxx>
Date: 13 Apr 2002 10:50:29 -0400
>>>>> "O" == Oleg Tkachenko <olegt@xxxxxxxxxxxxx> writes:

    O> I think it's bad idea to accept any data from users without any
    O> kind of input validation, that's poor design and may be
    O> dangerous too.  

Alas, the real world is often less than perfect ;)

    O> And I don't understand why markup data (user html) have to be
    O> represented as character data (CDATA). The better idea is to
    O> validate user input and fix up any errors at input stage and to
    O> well form that html by some html validator, take a look at HTML
    O> Tidy (tidy.sf.net) for example.

Valid HTML is not necessarily valid XML.  For example: <p> and <br>
and as regular use of Tidy shows, the fixups are not always automatic;
valid and displayable HTML can contain severe ambiguities.  

What's more, if DTD validation of the envelope were required (and it
is for data integrity across the transport and because xsl document()
validation in xalan cannot be turned off) would I not need all
possible XHMTL DTDs in the XML envelope DTD to accommodate all
possible variations of that one text block?

I know of no other blog/im sofware that does rigid re-interpretation
of HTML, so the real lesson would be "use Perl instead" ;) If our
software must include every browser's pet DTD and an advanced AI text
parser that cleans up every possible variation in end-user supplied
HTML, the development cost would be astronomical; long before I did
this, I would use a java namespace extension to URLEncode/Decode the
HTML string (which is what I did last time, but this time I wanted to
be portable and friendly to non-Java clients accessing the data).

Isn't it odd that while there is a solution for embedding verbatim
markup inside XML, there is apparently no solution for extracting it?

-- 
Gary Lawrence Murphy <garym@xxxxxxxxxxx> TeleDynamics Communications Inc
Business Innovations Through Open Source Systems: http://www.teledyn.com
"Computers are useless.  They can only give you answers."(Pablo Picasso)


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread