RE: [xsl] plea for help...

Subject: RE: [xsl] plea for help...
From: Mike Ferrando <mikeferrando@xxxxxxxxx>
Date: Thu, 9 Mar 2006 14:31:08 -0800 (PST)
Wendell,
I attended.

It was very well done. A great help for beginners as well as good
insights for those with lots of battle scars.

Thanks,
Mike Ferrando
Library Technician
Library of Congress
Washington, DC
202-707-4454

--- Wendell Piez <wapiez@xxxxxxxxxxxxxxxx> wrote:

> Walter,
> 
> At Mulberry we recently gave a seminar on the topic of converting 
> HTML to XML, so the issues are fresh in my mind.
> 
> You're facing a fairly complex set of problems, but they can be 
> simplified (as you are discovering) by distinguishing between
> 
> A. The syntactic conversion of HTML to XML
> B. The "semantic" conversion from HTML display-oriented tagging to
> a 
> stronger form of tagging in XML.
> 
> Other contributors have posted links to tools that help you with
> job 
> A -- Tidy and its ilk -- and it appears you've got a handle on
> that. 
> This work can be largely or entirely automated. Of course, what you
> 
> get out the other end is still HTML tagging, albeit in XML syntax 
> (it'll be either valid XHTML or a similar XML-compliant HTML), so
> as 
> you're finding it's not good to go for everything you might do with
> 
> well-designed XML markup. But to have it XML syntactically is
> already 
> a big step, because you can then use more and better tools on it to
> 
> take it the rest of the way -- including (which is the question
> isn't 
> totally off topic here) XSLT.
> 
> To do conversion B, however, is an entirely different kettle of
> fish 
> -- and it is beyond the scope of this list, I'm afraid.
> 
> As long as I'm already on it, however, I am willing to comment that
> 
> the scope and difficulty of conversion B is directly related both
> to 
> the quality of tagging in your source (HTML can be "clean" or 
> "dirty", consistent or messy, even after it's made XML-conformant
> in 
> its syntax) and, most dramatically, to the nature of your target
> tag 
> set and to the feasibility of mapping from the HTML you have to
> this target.
> 
> Sometimes this conversion can be automated; sometimes it can be 
> mostly automated; often it requires a good measure of attention
> from 
> human beings to determine how things should be converted in any
> given case.
> 
> The design of that target markup, however, is critical; by itself, 
> this factor alone can make or break your project. There is an 
> infinity of things potentially expressible in XML, which a machine,
> 
> even one programmed with very sophisticated heuristics, will not
> know 
> how to tag correctly, even when it's starting with some kind of
> HTML tagging.
> 
> Accordingly, generally successful efforts at this kind of
> conversion 
> include both designing that format up front, and controlling its 
> design carefully. Design it to concrete requirements, not just to 
> what you think might be useful or fun to have some day, and don't
> be 
> over-ambitious. You can't convert to a target you can't see. But if
> 
> you have a design, the places where conversion is easy or difficult
> 
> will fairly quickly come to light and you can figure out how to
> deal with them.
> 
> I think earlier someone suggested you prototype this first before 
> attempting it. That's very good advice.
> 
> There are also professionals who will gladly share their experience
> 
> in this area, if you are in a position to save money over the long 
> term by investing it intelligently in the near term.
> 
> Good luck,
> Wendell
> 
> At 11:52 AM 3/9/2006, you wrote:
> 
> >On Wed, March 8, 2006 5:28 pm, Florent Georges wrote:
> > > Walter Torres wrote:
> > >
> > >
> > >> 1) convert HMTL into well formed HTML (many are not)
> > >> 2) convert well formed HTML into xHTML
> > >>
> > >
> > > Tidy HTML will give you XHTML from HTML.
> >
> >Yes, just found it late last night. Been playing with it all
> morning.
> >
> >Getting it to work in PHP5 is waht I'm focusing on now.
> >
> >
> > >> 3) convert xHTML into XML
> > >>
> > >
> > > An XHTML instance is already an XML instance.
> >
> >Yes, I understand that.
> >
> >But I'm trying to get this to a "pure" xml, no display
> characteristics
> >markup what so ever!
> >
> >The idea here is to have a "raw/naked" file as possible, that way
> any
> >system can display this as they see fit.
> >
> >
> > > If you want to translate the instance from XHTML to an other
> XML document
> > > type, XSLT may be of great help.
> >
> >Sure, that way I can great a look for website A which is different
> than
> >website B, then create a text or RTF only or even email text or
> HTML or
> >even via web-phone.
> >
> >This is why I was asking about how different folks hand this kind
> of
> >content. What kind of markup it contains, etc.
> >
> >
> > >> 4) create XSLTs to transpose XML back to HTML for page display
> > >
> > > Here again, XSLT may be of great help.
> >
> >Right.
> >
> >Thanks
> >
> >Walter
> 
> 
>
======================================================================
> Wendell Piez                           
> mailto:wapiez@xxxxxxxxxxxxxxxx
> Mulberry Technologies, Inc.               
> http://www.mulberrytech.com
> 17 West Jefferson Street                    Direct Phone:
> 301/315-9635
> Suite 207                                          Phone:
> 301/315-9631
> Rockville, MD  20850                                 Fax:
> 301/315-8285
>
----------------------------------------------------------------------
>    Mulberry Technologies: A Consultancy Specializing in SGML and
> XML
>
======================================================================
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Current Thread