At 01:22 AM 5/25/01, Chris wrote:
> However, it is not at all immediately obvious to this
> newbie why a file that is already well-formed XML
> cannot undergo such a simple transformation using
> XSLT. This seems to be a limitation to XSLT, not an
> inherently nonsensical thing to do.
Indeed. On the other hand, given the data model that XSLT works on, it's
arguable whether the transformation of
this is _underlined_ text
to
this is <u>underlined</u> text
is really that simple. Note I said "given the data model". An XSLT
transformation "describes rules for transforming a source tree into a
result tree" (XSLT 1). Now, if you have a parser that picks up your input
string and makes a tree out of it, as in
[text] this is
[element] u
[text] underlined
[text] text
(this is of course only a representation of the node tree, not the node
tree itself), then the transformation is trivial. But an XML parser doesn't
do that. (There are members of this list that could rig up a little parser
to do it, but it wouldn't be an XML parser. Such a parser could be wired to
an XSLT transformation engine. But if you did that, the work of construing
your input according to the data model would be done, and that's the only
hard thing about your task -- you then wouldn't even need a transform
unless you wanted one for some other reason, such as extensibility.)
The fact that your input happens already to be XML is actually moot here.
It's not that it's well-formed: it's how and where the information you need
is expressed in it. I could wrap this email into <email>...</email> tags
and make it (allowing for escaping a few characters here and there) XML.
But that doesn't mean I could easily write a filter in XSLT that would pick
out, say, all the sentences from it, or all the adjectives, and put them in
alphabetical order. That's actually not much further from XSLT than what
you're trying to do. Bottom line is, if the information you're trying to
find isn't in the XML markup, it's hard for XSLT to see it.
> I know other tools exist to do this; my goal is to
> learn about XML and XSLT, and this task was simply
> chosen to focus my study. The goal is to learn
> something about XML and its uses/limitations, not to
> solve this particular text transformation problem.
And so you are learning! lesson number one, pick the right tool for the
job. Understand the tools and their capabilities and strengths, so you can
pick the right one. Fall into a trap or two while coming to that
understanding: that's cool, no blame.
> It is perfectly ok for me to take away from this the
> conclusion "XSLT is not suited to this kind of
> transformation," but I don't see how one could be
> expected to know that in the beginning.
That's fair. No one warned you "XSLT excels at transforms out of
well-formed XML, but really bites at transforms of arbitrary data streams".
Probably too much hype. Oh, and explaining the difference between
well-formed XML input and arbitrary text input. That's a difference that is
critical, but obscure to many (especially if they're used to handling
not-well-formed markup like HTML, which essentially has to be treated as an
arbitrary text stream, and which XSLT is also no good at). And then -- the
difficulty that you're having -- that well-formed in itself isn't enough.
The source *markup* has to identify the features you are leveraging for the
transform.
Prediction: many ambitious projects in the next few years will founder
because the input data does not prove to be high enough quality (meaning
both semantic completeness and correctness -- something a machine cannot
know!) to drive transformations to get high-quality output.
And I don't
> think I would run into this problem if I were
> transforming to, say, latex.
Yes you would. It's the nature of your source data that's the problem, not
your output.
It is only because HTML
> elements are interpreted as XML elements that I have
> trouble.
Not so. It's because an XML parser doesn't know that you want a "_" or a
"~" to start an element, and the next one to end it.
> Can you give me a general statement of the sorts of
> applications for which XSL *is* well-suited? It's not
> a database, but it does have several database-like
> capabilities. It's not for text markup, though it can
> do that, sort of, sometimes...
It is definitely for text markup -- well-formed XML text markup -- but not
other kinds (unless, as suggested above, you provide your own parser to
render your non-standard markup into the XSLT data model).
Off-the-shelf XML parsers can only be expected to parse XML input, however:
that's your problem. It only *seems* like a problem with XSLT.
If you want to try something easy, write a transform that turns
this is <u>underlined</u> text
into
this is _underlined_ text
And don't get discouraged by this immediate problem. XSLT is great -- but
it really shines when it is combined with other tools that make up for what
it doesn't do.
Regards,
Wendell
======================================================================
Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list