Subject: RE: [xsl] Ingoring HTML From: Jay Burgess <lists@xxxxxxxxxxx> Date: Fri, 17 Jun 2005 13:21:56 -0700 |
Jon, Thank you very much for all of the information--especially on a Friday afternoon. :) You've confirmed that it's not just a flag I set somewhere, so I'll dig into it and get it solved. Thanks again. Jay -----Original Message----- From: Jon Gorman [mailto:jonathan.gorman@xxxxxxxxx] Sent: Friday, June 17, 2005 3:14 PM To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Subject: Re: [xsl] Ingoring HTML On 6/17/05, Jay Burgess <lists@xxxxxxxxxxx> wrote: > I apologize if this is in the FAQ, but I've searched and can't find it. (I'm > kind of new to XSL, so I may just have not seen it.) This is a faq of sorts, but I had a little bit of a difficult time finding an answer to it in Dave Pawson's FAQ as well. Of course, I just did a quick glance. I'd recommend skimming the the CDATA section as well. > > I've got some XML that contains HTML-formatted text. For example: > > <title><SPAN style="font-size: 13pt; font-family: Verdana; >The > <b>Text</b> That I Want</SPAN></title> > "HTML-formatted text" is a little bit nonsensical. HTML itself says that < is meant as a stand-in for <, so when you have it it's not a tag. Since namespaces were rather slow to get off to start, we ended up seeing people put so-called "HTML" in XML *cough* RSS *cough*. But to any XML application, this is one big chunk of text. So, some possible advice: 1) if you can change the input format so that it uses namespaces and actually embeds real XHTML into the documents you're creating, do so. Or at least have it be an option. 2) If you can't do that, I'm sure you can find a more general solution if you hunt through the archives. The essential solution will probably be along the lines of looking for < and >s and throwing any text in them out via some of the XPATH/XSLT string functions. Might be much easier with XSLT 2.0 3) It may be possible with a combination of d-o-e and doing multiple transformations, regex scripting or other techniques to replace the various < and > in certain elements but not others, then reprocess that document through your final stylesheet. Of couse, this makes it slightly dangerous. Dig through the archives there might be a more general solution already done or someone else will be able to give you one instead of just giving you some ranting. (I blame Friday afternoon and a slow server for my current long-winded explanation why this type of embedding is evil). Short answer, it's probably not difficult as long as it's relatively straightforward. If the "html" inside the xml is complex at all or you are using < in other places, you might have difficulty. Extremely simple if you can just have the input source use namespaces and you're comfortable with how XSLT deals with namespaces. Jon Gorman
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Ingoring HTML, Sam D. Chuparkoff | Thread | RE: [xsl] Ingoring HTML, Jay Burgess |
Re: [xsl] Ingoring HTML, Jon Gorman | Date | Re: [xsl] Ingoring HTML, Sam D. Chuparkoff |
Month |