Subject: Re: [xsl] When to use text()|
From: "Abel Braaksma (Exselt)" <abel@xxxxxxxxxx>
Date: Sat, 22 Mar 2014 15:48:27 +0100
Interesting thoughts. When designing a language, there will always be a lot of discussion about the choice of words for keywords, terminology, language constructs. Take C#, they used the word "assembly" for physically separated packages, and the word "namespace" for logical separations. To this day, many (starting) programmers have a hard time understanding those concepts, not in the last place because "assembly" reminds them of assembly language and "namespace" about XML namespaces. Similarly, why did they choose the keyword "fixed" when the meaning is to "pin" a variable? Those discussions will never end, and should never end. It will always remind language designers to think carefully about the words they choose. In this particular case, the working group at the time had a conflict of interest. There was XML, which was already defined, which had text nodes. And there was XPath (not XSLT) that required a method for selecting those text nodes. Since they were already called text nodes in DOM , it made sense to follow this nomenclature. Note that, in the XML Infoset, they did not exist, nor in the original XML specifications. Instead, they were called character information items, which referred to the individual characters, not the whole node. On the other hand they had a requirement to be able to atomize nodes, in other words, to turn them into what is commonly known in computing as a "string". There are languages that use the keyword TEXT when referring to strings, but many common languages use the keyword string. What were they to do? Are there other alternatives? Text nodes needed a name and atomized text nodes too. Both were an important requirement, because if you would always atomize, then how can you query mixed content? An important distinction is that text() is a a KindTest (it tests whether a given node is a text node, as such, it in fact returns a boolean), and string() and string(x) are functions that take an implicit or explicit argument and turn it into a string. One might argue that you could use is-text() and is-comment(), and conversely convert-to-string and the like But that doesn't work well in an expression as para/em/is-text() or even para/em[is-text()], because the semantics here are not "is" but "has" (select all the nodes that have an "em" parent, or select all the em-nodes that have one or more text children). And my argument against convert-to-string would be that it is annoyingly long, but that's just me. My argument against string() itself is that it looks too much like a constructor function, which it is not. I'm not saying that the choice of words is perfect, but I wanted to point out that the choice of words is never an easy one. W3C standards are created by consensus of all the members and it is an open process where non-members can submit bug reports to draft standards and the working group is required to look into them. If you have a strong argument, they are likely to take your argument seriously. That said, I invite you and everyone on this list or elsewhere to look at the current XSLT 3.0 Last Call Working Draft. Even now there are still some open bugs on choices of terms and keywords. It is still open for bug-reports from anyone, which you can file into W3C's bugzilla (signing up is easy). Small disclaimer: I was not a member of the WG at the time they needed to make a choice for the string() function and text() kindtest, so the road to consensus I laid out above may not be the actual road that lead to consensus. Cheers, Abel Braaksma Exselt XSLT 3.0 processor http://exselt.net PS: you don't need to look up the spec to remind you of text() vs string(), in fact, just about any book on XSLT clearly explains their semantics and pitfalls. And you are right, people starting out with a language will start with a tutorial book, and that is exactly where they learn this distinction.  http://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html  http://www.w3.org/TR/xml-infoset/#infoitem.character  http://www.w3.org/TR/xslt-30/  https://www.w3.org/Bugs/Public/ On 21-3-2014 17:00, Ihe Onwuka wrote: > On Fri, Mar 21, 2014 at 3:39 PM, Eliot Kimber <ekimber@xxxxxxxxxxxx> wrote: >> What is the alternative? Invent new terms for all concepts for which a >> common term would be appropriate? >> >> It is simply the case that in all technical standards there will be jargon >> uses of common terms. It is not reasonable or realistic to expect >> otherwise. It is not realistic or reasonable to expect to not have to look >> things up to learn or be reminded of the specific meaning of something in >> a standard. >> > As to what is reasonable, my starting benchmark would be how many > programmers have ever read a language specification of any sort. > > I would factor into the equation the fact that there are often several > layers of online tutorial, textbook, programmers refernce/nutshell > books separating the programmer from the programming specification and > wonder how many of those would include this sort of factlet.