Subject: RE: [xsl] plea for help... From: Mike Ferrando <mikeferrando@xxxxxxxxx> Date: Thu, 9 Mar 2006 14:31:08 -0800 (PST) |
Wendell, I attended. It was very well done. A great help for beginners as well as good insights for those with lots of battle scars. Thanks, Mike Ferrando Library Technician Library of Congress Washington, DC 202-707-4454 --- Wendell Piez <wapiez@xxxxxxxxxxxxxxxx> wrote: > Walter, > > At Mulberry we recently gave a seminar on the topic of converting > HTML to XML, so the issues are fresh in my mind. > > You're facing a fairly complex set of problems, but they can be > simplified (as you are discovering) by distinguishing between > > A. The syntactic conversion of HTML to XML > B. The "semantic" conversion from HTML display-oriented tagging to > a > stronger form of tagging in XML. > > Other contributors have posted links to tools that help you with > job > A -- Tidy and its ilk -- and it appears you've got a handle on > that. > This work can be largely or entirely automated. Of course, what you > > get out the other end is still HTML tagging, albeit in XML syntax > (it'll be either valid XHTML or a similar XML-compliant HTML), so > as > you're finding it's not good to go for everything you might do with > > well-designed XML markup. But to have it XML syntactically is > already > a big step, because you can then use more and better tools on it to > > take it the rest of the way -- including (which is the question > isn't > totally off topic here) XSLT. > > To do conversion B, however, is an entirely different kettle of > fish > -- and it is beyond the scope of this list, I'm afraid. > > As long as I'm already on it, however, I am willing to comment that > > the scope and difficulty of conversion B is directly related both > to > the quality of tagging in your source (HTML can be "clean" or > "dirty", consistent or messy, even after it's made XML-conformant > in > its syntax) and, most dramatically, to the nature of your target > tag > set and to the feasibility of mapping from the HTML you have to > this target. > > Sometimes this conversion can be automated; sometimes it can be > mostly automated; often it requires a good measure of attention > from > human beings to determine how things should be converted in any > given case. > > The design of that target markup, however, is critical; by itself, > this factor alone can make or break your project. There is an > infinity of things potentially expressible in XML, which a machine, > > even one programmed with very sophisticated heuristics, will not > know > how to tag correctly, even when it's starting with some kind of > HTML tagging. > > Accordingly, generally successful efforts at this kind of > conversion > include both designing that format up front, and controlling its > design carefully. Design it to concrete requirements, not just to > what you think might be useful or fun to have some day, and don't > be > over-ambitious. You can't convert to a target you can't see. But if > > you have a design, the places where conversion is easy or difficult > > will fairly quickly come to light and you can figure out how to > deal with them. > > I think earlier someone suggested you prototype this first before > attempting it. That's very good advice. > > There are also professionals who will gladly share their experience > > in this area, if you are in a position to save money over the long > term by investing it intelligently in the near term. > > Good luck, > Wendell > > At 11:52 AM 3/9/2006, you wrote: > > >On Wed, March 8, 2006 5:28 pm, Florent Georges wrote: > > > Walter Torres wrote: > > > > > > > > >> 1) convert HMTL into well formed HTML (many are not) > > >> 2) convert well formed HTML into xHTML > > >> > > > > > > Tidy HTML will give you XHTML from HTML. > > > >Yes, just found it late last night. Been playing with it all > morning. > > > >Getting it to work in PHP5 is waht I'm focusing on now. > > > > > > >> 3) convert xHTML into XML > > >> > > > > > > An XHTML instance is already an XML instance. > > > >Yes, I understand that. > > > >But I'm trying to get this to a "pure" xml, no display > characteristics > >markup what so ever! > > > >The idea here is to have a "raw/naked" file as possible, that way > any > >system can display this as they see fit. > > > > > > > If you want to translate the instance from XHTML to an other > XML document > > > type, XSLT may be of great help. > > > >Sure, that way I can great a look for website A which is different > than > >website B, then create a text or RTF only or even email text or > HTML or > >even via web-phone. > > > >This is why I was asking about how different folks hand this kind > of > >content. What kind of markup it contains, etc. > > > > > > >> 4) create XSLTs to transpose XML back to HTML for page display > > > > > > Here again, XSLT may be of great help. > > > >Right. > > > >Thanks > > > >Walter > > > ====================================================================== > Wendell Piez > mailto:wapiez@xxxxxxxxxxxxxxxx > Mulberry Technologies, Inc. > http://www.mulberrytech.com > 17 West Jefferson Street Direct Phone: > 301/315-9635 > Suite 207 Phone: > 301/315-9631 > Rockville, MD 20850 Fax: > 301/315-8285 > ---------------------------------------------------------------------- > Mulberry Technologies: A Consultancy Specializing in SGML and > XML > ====================================================================== > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] plea for help..., Michael Kay | Thread | Re: [xsl] plea for help..., Alexander Johannesen |
Re: [xsl] Getting Frames to work pr, Shirley Gasch | Date | Re: [xsl] XSLT question on xsl:vari, Alp |
Month |