Dear XSLT List,
I'm looking into developing an XSLT 2.0 stylesheet that will take a
linguistic stem of the form XYZ- (where X, Y, and Z are the letters in
the stem of a lexeme) and generate the full range of endings that occur
on that word in the relevant grammatical paradigm. Writing up a set of
<stem> elements and a set of <ending> elements and pasting together all
possible combinations is easy enough; the problem is sandhi rules, which
may cause both the stem-final consonant (Z in the preceding example) and
the grammatical ending to change shape in certain circumstances. As a
semi-hypothetical example:
1. Given stems "Zen-" and "duS-"
2. Given basic ending "y"
3. "Zen-" plus basic "y" yields "Zeny" (no changes).
4. "duS-" plus basic "y" yields "duSE" (basic "y" is replaced by "E")
because it's a property of stem-final "S-" that it causes following
grammatical endings that normally begin with "y" to change their first
letters to "E". Sequences of "Sy" are fine elsewhere in words; this rule
applies only at the juncture of stem and grammatical ending.
A brute-force solution is easy enough; just string together replace()
functions like:
<xsl:variable name="$temp06" select="replace('$temp05','S-y','SE')"/>
(where the first rule creates $temp01, feeds it to rule that creates
$temp02, etc., and the function ultimately returns the output of the
final replace() operation).
This type of brute-force approach would string together dozens (possibly
hundreds) of these rules to account for all possible sandhi
modifications. That seems inappropriately crude because the rules
actually apply to *classes* of letters, so that, for example, basic "y"
endings are replaced by "E" not just after "S", but after half a dozen
different consonants, as well as after one or two consonant clusters
(that is, the last stem consonant isn't the trigger for the change in
those cases, it's the combination of the last two).
What I'm groping for, then, is an elegant rule-based function that lets
me write a small number of rules by defining classes of letters to which
they apply, something like "after 'S', 'Z', 'C', 'St', and 'Zd', 'y' is
replaced by 'E'." As I mention above, these rules apply only at the
boundary of stem plus ending; "S" can be followed by "y" elsewhere in a
word. Since I've encoded my stems with trailing hyphens, I can easily
distinguish "Sy" (which should be left alone) from "S-y" (which should
be replaced by "SE").
There is also a type of rule where the stem-final consonant changes but
the grammatical ending doesn't, along the lines of "when 'E' follows a
stem that ends in 'k', 'g', or 'x', that stem-final consonant changes
into 'C', 'Z', and 'S', respectively, and the 'E' doesn't change."
Finally, there is a slightly less brute-force approach where I would
create not just one paradigm of basic endings plus rules to change them
in certain circumstances, but several paradigms that already incorporate
the changes, and I would look at the last stem consonant or two and
select the appropriate paradigm. Is such a "selection" approach more
appropriate for this type of problem than the "modification" approach
I've been contemplating?
In any case, I'd be grateful for any pointers to an elegant way of
expressing this type of rule in XSLT.
Sincerely,
David
djbpitt+xml@xxxxxxxx <mailto:djbpitt+xml@xxxxxxxx>