[xsl] constraining values with a pattern facet in relax ng?

Subject: [xsl] constraining values with a pattern facet in relax ng?
From: "Birnbaum, David J" <djbpitt@xxxxxxxx>
Date: Fri, 8 Jul 2011 11:26:08 -0400
Dear xsl-list,

I'm writing a RelaxNG compact-syntax schema where users can enter a page range
as the value of an element, and I think (see below for reservations and an
alternative) that I'd like to constrain the allowed values (integer and
lexical) with a facet. I'm uncertain about how to proceed, or even whether I'm
conceptualizing the problem in a useful way, and I'd be grateful for advice.

A page range begins with a start page, which is a positive integer with a
lexical value that consists of 1-3 digits, where the leftmost cannot be 0. In
other words, it looks like the standard lexical representation of a positive
integer.

The page range can end there (that is, if the text falls on a single page, the
end page is not specified), but if the text spans more than one page, the
start page is followed by an en dash (I'll write a hyphen in the examples
below for typographic convenience, but in production there would be an en dash
instead) and then a second part that indicates the end page. The constraints
on the value (integer and lexical) of the end page are easy to conceptualize
but awkward (impossible?) to conceptualize as a regex, which is what makes me
wonder whether I'm thinking about the problem in a useful way:

1. If the start page consists of 1-2 digits, the lexical representation of the
end page contains the full value, e.g., 5-6, 5-25, 5-123, 12-15, 12-34,
12-165. The lexical representation is the one we naturally expect for the
integer.

2. If the first part consists of 3 digits, the second part contains either 2
or 3 digits, with the 3rd digit appearing only if it is different from the 3rd
digit of the start page, e.g., 103-06 (which means 103-106, but there must be
at least two digits, even though the tens value is zero for both the start and
end pages), 123-26 (which means 123-126; the initial digit is omitted because
it's the same for both the start and end pages, but the middle digit isn't,
even though it's the same, because there must be at least two), 123-265.

3. The integer value of the end page (including the first of three digits,
which may or may not be present in the lexical representation) must be greater
than the value of the start page. In other words, in 106-09 the 09 must be
recognized as greater than 106, etc.

Is it even possible to express these constraints with a regex?

If it is impossible, or possible but wrong-headed, here's an alternative:
Should I have the full start and pages entered in different elements, with the
lexical space constrained as well as possible with a pattern facet (1-3
digits, no leading zero) and then 1) use schematron to verify that the integer
value of the end page is greater than that of the start page and 2) use
xpath/xslt to format the lexical representation of the end page? I don't
really do anything with these values except print them, so having the user
enter what I want to print seemed more direct, but once I began to think about
constraining the values, that approach began to look unappealingly (and
perhaps even impossibly) complicated.

In case that wasn't bad enough, here are three further complications:

1. I know the number of pages in the books in question and would like to
specify maximum values, so that users couldn't try to enter a range like
456-98 for a 300-page book.

2. One set of entries consists of a three-volume series, so the page ranges
are actually something like "II: 123-34", meaning "volume 2, pp 123-234". Each
volume begins numbering the pages at 1 and I know the last page number for
each volume. If I'm going to constrain the maximum page value, I'd like to do
that in a way that is sensitive to the different lengths of each volume.

3. The three-volume series is a numbered set of texts, where, say volume 1
contains texts 1-84, volume 2 contains texts 85-147, and volume 3 contains
texts 148-210. The xml contains the text number as well as the page range, and
I'd like to constrain the page values to be credible. That is, I don't want a
user to be able to try to assign pages for text #99 to volume 3 because I know
that that text is in volume 2. (Or should I handle this by not having the user
enter a volume number, and just inferring that myself from the text number?)

Thanks,

David
djbpitt@xxxxxxxxx

Current Thread