[xsl] Imagine that the semantics of concatenating two regex patterns was this

Subject: [xsl] Imagine that the semantics of concatenating two regex patterns was this
From: "Roger L Costello costello@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 9 Mar 2025 10:21:17 -0000
Hi Folks,

Here is XPath that determines if the value of variable TEXT matches the
pattern 'A' (that is, does 'A' occur anywhere within the value of TEXT):

matches($TEXT, 'A')

The result of evaluating the expression is true or false.

String Oriented Symbolic Language (SNOBOL) is best known for its
pattern-matching facilities, which are very elaborate and powerful. In fact,
most of the SNOBOL language is composed of pattern-matching operations. With
SNOBOL here's how to examine the value of TEXT to see if it contains the
letter A:

TEXT 'A'

If the letter A occurs anywhere in the value of TEXT, the pattern match
succeeds. Otherwise it fails.

Here is an XPath pattern for vowels: 'A|E|I|O|U'

Suppose a pattern is to be used in a number of different places in a program;
we would like to define the pattern once. In XSLT we create a variable to hold
the pattern and then use the variable in matches():

<xsl:variable name="VOWELS" select="'A|E|I|O|U'"/>
<xsl:value-of select="matches($TEXT,$VOWELS)"/>

In SNOBOL you assign a name to a pattern:

VOWELS = 'A' | 'E' | 'I' | 'O' | 'U'

Subsequently this pattern may be referred to by the name VOWEL as in this
statement:

TEXT VOWELS

In SNOBOL patterns may be concatenated in the same way that strings are
concatenated (juxtaposition). For example, the statement

TEXT VOWELS 'T'

Succeeds if a vowel is immediately followed by a T in TEXT, i.e., if TEXT
contains one of AT, ER, IT, OT, or UT.

SNOBOL's semantics of pattern concatenation is fascinating. Clearly, more than
verbatim concatenation is happening because a verbatim concatenation of the
patterns would yield:

'A' | 'E' | 'I' | 'O' | 'U' 'T'

which is not correct. [I am guessing that a] SNOBOL compiler/interpreter
implicitly places parentheses around the first pattern:

('A' | 'E' | 'I' | 'O' | 'U') 'T'

The semantics of concatenating patterns in XPath is verbatim concatenation:

'A|E|I|O|U' || 'T' which yields the incorrect pattern 'A|E|I|O|UT'

If we want XPath to have the SNOBOL semantics, then we must explicitly place
parentheses around the first pattern:

'(' || 'A|E|I|O|U' || ')' || 'T' which yields the correct pattern
'(A|E|I|O|U)T'

Lessons Learned: how a programming language defines the semantics of pattern
concatenation can have a profound influence on programming style. Thus, if you
are creating a new programming language, be aware that there are other ways to
define the semantics of pattern concatenation-you might decide to define the
semantics as XSLT/XPath does-verbatim concatenation-but alternatively, you
might decide to define the semantics as SNOBOL does-parentheses implicitly
wrap the first pattern.

/Roger

P.S. I got the information about SNOBOL from the wonderful book, "A SNOBOL4
Primer" by Ralph E. Griswold and Madge T. Griswold, pages 20-21. Some of the
above sentences are excerpts from the book.

Current Thread