Re: Was: [xsl] mode and moved to Namespaces

Subject: Re: Was: [xsl] mode and moved to Namespaces
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Wed, 20 Apr 2011 09:35:13 +0200
Hi ac,

This thread is rather long, so please forgive me if I've misunderstood anything, but I'd like to add my thoughts to the discussion.

You seem to want to use namespaces as tags for names, which is not what they're intended for. As reason for doing so you consider space saving, but if space is an issue, don't use XML. If it's about the memory footprint, it doesn't matter whether your nodes have namespaces, because practically every node takes approximately the same amount of memory, regardless its name or kind (there's an older thread by Michael Kay where he explains how much memory each node takes). In other words, your argument for size doesn't play.

A namespace is prefix-agnostic. That means that, if <en:word /> is connected to namespace "http://example.com/french";, and <fr:word /> is so too, both qualified names are equal. Treating them differently is wrong design.

The real problem, however, comes from portability and understandability. You redefine namespaces to something that's nothing more than a tag or prefix. That makes your solution unportable and not machine readable anymore. I.e., if a simple identity transform would take all namespace prefixes and replaced them with ns1, ns2 etc (but leave the namespace itself, and hence the qualified names, intact), your application would fail. However, such transformations are quite common in XML and totally legal.

By redefining what a namespace means (or, more specifically, by ignoring it's real meaning and making it part of the local-name, which is basically what you are doing), you stop using XML by how it was meant to be. Your XML in and of itself is still compliant, but your applications and how they treat XML are not. That's a choice, but if you go down that path, you can just as well choose your own format, which will give you far better results in performance, space and requirements.

----

Back to your real problem: suppose we accept that you need to use XML and that you do not want to abuse namespaces for something they're not. How could we tackle your issue? I'd go for a straight structure and use what's already there:

<word type="title" xml:lang="en-GB" gender="female">Mrs</word>
<word type="title" xml:lang="fr-FR" gender="female">Mme</word>

this is the approach Microsoft chooses (or at least similar) in Word-ML, which looks big, but is quite workable. Now, suppose you want to minimize the disk footprint (as already said, the memory footprint will be largely the same regardless), you could do something like this:

<word type="title" en-m="Mr" en-f="Mrs" fr-m="M." fr-f="Mme"/>

as it turns out, this is effectively smaller than your namespace-oriented approach. If you really want the type of the word in each and every attribute-name and split the atribute name later, you can do that, but code-smell ahead! Something like:

<word en-title="Mr" en-f-title="Mrs" fr-title="M." fr-f-title="Mme" ... />

But really, you shouldn't go down that path, it has exactly the same drawbacks as your namespace approach (albeit slightly better extensible). It will backfire once you start using it.

Moral of this story: use XML for what it is for: a verbose and descriptive method of describing data. If space is of essence, don't use XML, as it will work against you. Use namespaces for what they're supposed for: separating semantically different sets of names that are supposed to be treated differently (compare xslt namespace and svg namespace: they require different applications).

Kind regards,

Abel Braaksma



/_On 20-4-2011 3:01, ac wrote:_/
Hi Jirka,

I appreciate your time, consideration, suggestions, and arguments.

You are right, there is a lookup cost, and this is not the way I prefer to use namespaces. OTOH, the space saving and associated overhead saving can justify the lookup cost for something that can get large and needs to stay in busy memory, at least for a while.

It would be much nicer if namespaces could be further supported, including support for hierarchical namespaces, as well as namespace optimization. Namespaces are, apart from comments, one of the three basic XML constructs. Three isn't much, which is fine, but each should be maximized to help better satisfy application requirements.

I do not doubt that you are open-minded and I certainly appreciate your constructive comments. In fact I agree with them. I do admit that "simply wrong" did not allow me to understand and contribute technically. But it is looking much better now.

I also realize that matching on the names is risky and would be better addressed through the URIs. The added cost is not high enough to justify the risk, and the space saving is probably still worth the effort, depending on the number of languages that need to be supported, the size of the vocabulary, and the memory constraints.

Still, as everything is a trade off, I would still maintain, given all constraints, that this is another valid use case for namespaces, when it applies. I would also recommend that we consider how namespaces can better fulfill more useful roles in XML, including how they can be expanded, and more efficiently supported.

There is a real conceptual need for namespaces and it may be that we are just starting to better realize it.

Regards,
ac



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

ac wrote:

The current translation dictionary is setup somewhat like:
...
<word en:title="Mr" f-en:title="Mrs" fr:title="M." f-fr:title="Mme" ... />
<word en:noun="chair" fr:noun="chaise" ... />
...


all feminine variants can be returned with:
/dic/word/@*[starts-with(name(.), 'f-')]
Such lookups will tend to be quite slow because matching on name of
element/attribute can't be done using dictionary -- high efficient XSLT
implementations doesn't store element/attributes names for each node,
but they store just number pointing to dictionary with the real
qualified name. This saves memory and makes matching on name very fast.
But if name is not directly present in XPath such fast matching can't be
done.

all French feminine can be returned with
/dic/word/@f-fr:*
all French feminine adjectives can be returned with
/dic/word/@f-fr:adjective
all translated English words return form
/dic/word/@en:*
The trouble with such approach is that you can't change language during
the runtime. You have to pregenerate all queries before running
transformation or use dynamic XPath evaluation (which is not part of
XSLT standard yet).

all English nouns, whatever gender, can be obtained with something like
/dic/word/@*:nouns[contains(name(), 'en:')]
If you are using namespaces then this code is not correct. You should
match on namespace name not actual prefix used. So query should be more
like:

/dic/word/@*:nouns[namespace-uri() = 'whatever URI was assigned to en']

It must be good to know what is right from what is wrong,
especially with an absolute perspective.
I have to admit that I have always had some disbelief about absolute
beliefs,
but I will keep an open mind, at least just in case.
I consider myself very open-minded. Your usage for namespaces in this
particular case surely works for you, but it's misuse of namespaces.
They were not designed for this and their usage for this several
engineering flaws.

- -- - ------------------------------------------------------------------
Jirka Kosek e-mail: jirka@xxxxxxxx http://xmlguru.cz
- ------------------------------------------------------------------
Professional XML consulting and training services
DocBook customization, custom XSLT/XSL-FO document processing
- ------------------------------------------------------------------
OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 member
- ------------------------------------------------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/


iEYEARECAAYFAk2t+6cACgkQzwmSw7n0dR4F/ACfRIwtkthd9SXVzk4fV+iKoHoe
XbkAnR6T4sWLdIzdyi/+J9gjIr/V8jEd
=1Loa
-----END PGP SIGNATURE-----

Current Thread