Re: Fwd: Re: [xsl] element nodes in a string?

Subject: Re: Fwd: Re: [xsl] element nodes in a string?
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Fri, 25 May 2001 11:54:30 +0100
At 01:22 AM 5/25/01, Chris wrote:
> However, it is not at all immediately obvious to this
> newbie why a file that is already well-formed XML
> cannot undergo such a simple transformation using
> XSLT.  This seems to be a limitation to XSLT, not an
> inherently nonsensical thing to do.

Indeed. On the other hand, given the data model that XSLT works on, it's arguable whether the transformation of


this is _underlined_ text

to

this is <u>underlined</u> text

is really that simple. Note I said "given the data model". An XSLT transformation "describes rules for transforming a source tree into a result tree" (XSLT 1). Now, if you have a parser that picks up your input string and makes a tree out of it, as in

[text] this is
  [element] u
    [text] underlined
[text] text

(this is of course only a representation of the node tree, not the node tree itself), then the transformation is trivial. But an XML parser doesn't do that. (There are members of this list that could rig up a little parser to do it, but it wouldn't be an XML parser. Such a parser could be wired to an XSLT transformation engine. But if you did that, the work of construing your input according to the data model would be done, and that's the only hard thing about your task -- you then wouldn't even need a transform unless you wanted one for some other reason, such as extensibility.)

The fact that your input happens already to be XML is actually moot here. It's not that it's well-formed: it's how and where the information you need is expressed in it. I could wrap this email into <email>...</email> tags and make it (allowing for escaping a few characters here and there) XML. But that doesn't mean I could easily write a filter in XSLT that would pick out, say, all the sentences from it, or all the adjectives, and put them in alphabetical order. That's actually not much further from XSLT than what you're trying to do. Bottom line is, if the information you're trying to find isn't in the XML markup, it's hard for XSLT to see it.

> I know other tools exist to do this; my goal is to
> learn about XML and XSLT, and this task was simply
> chosen to focus my study.  The goal is to learn
> something about XML and its uses/limitations, not to
> solve this particular text transformation problem.

And so you are learning! lesson number one, pick the right tool for the job. Understand the tools and their capabilities and strengths, so you can pick the right one. Fall into a trap or two while coming to that understanding: that's cool, no blame.


> It is perfectly ok for me to take away from this the
> conclusion "XSLT is not suited to this kind of
> transformation," but I don't see how one could be
> expected to know that in the beginning.

That's fair. No one warned you "XSLT excels at transforms out of well-formed XML, but really bites at transforms of arbitrary data streams". Probably too much hype. Oh, and explaining the difference between well-formed XML input and arbitrary text input. That's a difference that is critical, but obscure to many (especially if they're used to handling not-well-formed markup like HTML, which essentially has to be treated as an arbitrary text stream, and which XSLT is also no good at). And then -- the difficulty that you're having -- that well-formed in itself isn't enough. The source *markup* has to identify the features you are leveraging for the transform.


Prediction: many ambitious projects in the next few years will founder because the input data does not prove to be high enough quality (meaning both semantic completeness and correctness -- something a machine cannot know!) to drive transformations to get high-quality output.

  And I don't
> think I would run into this problem if I were
> transforming to, say, latex.

Yes you would. It's the nature of your source data that's the problem, not your output.


  It is only because HTML
> elements are interpreted as XML elements that I have
> trouble.

Not so. It's because an XML parser doesn't know that you want a "_" or a "~" to start an element, and the next one to end it.


> Can you give me a general statement of the sorts of
> applications for which XSL *is* well-suited?  It's not
> a database, but it does have several database-like
> capabilities.  It's not for text markup, though it can
> do that, sort of, sometimes...

It is definitely for text markup -- well-formed XML text markup -- but not other kinds (unless, as suggested above, you provide your own parser to render your non-standard markup into the XSLT data model).


Off-the-shelf XML parsers can only be expected to parse XML input, however: that's your problem. It only *seems* like a problem with XSLT.

If you want to try something easy, write a transform that turns

this is <u>underlined</u> text

into

this is _underlined_ text

And don't get discouraged by this immediate problem. XSLT is great -- but it really shines when it is combined with other tools that make up for what it doesn't do.

Regards,
Wendell




====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================


XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list



Current Thread