Re: [xsl] constraining values with a pattern facet in relax ng?

Subject: Re: [xsl] constraining values with a pattern facet in relax ng?
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Fri, 08 Jul 2011 11:52:16 -0400
Dear David,

Given that your rules are complex, I think you need to think not only about how to implement and enforce them, but about who will be doing so ... your code, sure, but also your users?

In other words, I think the best solution to the problem depends partly on who the users are and how they'll be entering the data.

Especially if the rules are complex, I think most users would be happier to enter, say, "101" and "105" and have the machine then figure out to display "101-5", than they would be to remember that this one should be "101-5" but another one should be "31-35".

If this is the case, this implies that you need your XSLT to know how to crunch "101" and "105" into "101-5" rather than to match and validate "101-5". (FWIW, I think your requirements for that would go beyond regex checking, since for example "121-15" would presumably be incorrect. You will need more power.)

It would also make other aspects of your problem easier, such as being able to confirm that the last page has a higher number than the first page (if this is, in fact, a rule), or to see that text numbers, volume numbers and page numbers all align.

But your users may be different, or your use case.

Say they're transcribing entries that already have this info. What if the source data is actually incorrect? Should your scribes be correcting "101-105" to "101-5"?

The bottom line is that I think your life will be easier if you collect the data in the simplest, most granular form possible. The validation (including the interdependency checks) and XSLT will be easier. And unless there's a reason they resist this, your users will probably thank you too.

Cheers,
Wendell

On 7/8/2011 11:26 AM, Birnbaum, David J wrote:
Dear xsl-list,

I'm writing a RelaxNG compact-syntax schema where users can enter a page range as the value of an element, and I think (see below for reservations and an alternative) that I'd like to constrain the allowed values (integer and lexical) with a facet. I'm uncertain about how to proceed, or even whether I'm conceptualizing the problem in a useful way, and I'd be grateful for advice.

A page range begins with a start page, which is a positive integer with a lexical value that consists of 1-3 digits, where the leftmost cannot be 0. In other words, it looks like the standard lexical representation of a positive integer.

The page range can end there (that is, if the text falls on a single page, the end page is not specified), but if the text spans more than one page, the start page is followed by an en dash (I'll write a hyphen in the examples below for typographic convenience, but in production there would be an en dash instead) and then a second part that indicates the end page. The constraints on the value (integer and lexical) of the end page are easy to conceptualize but awkward (impossible?) to conceptualize as a regex, which is what makes me wonder whether I'm thinking about the problem in a useful way:

1. If the start page consists of 1-2 digits, the lexical representation of the end page contains the full value, e.g., 5-6, 5-25, 5-123, 12-15, 12-34, 12-165. The lexical representation is the one we naturally expect for the integer.

2. If the first part consists of 3 digits, the second part contains either 2 or 3 digits, with the 3rd digit appearing only if it is different from the 3rd digit of the start page, e.g., 103-06 (which means 103-106, but there must be at least two digits, even though the tens value is zero for both the start and end pages), 123-26 (which means 123-126; the initial digit is omitted because it's the same for both the start and end pages, but the middle digit isn't, even though it's the same, because there must be at least two), 123-265.

3. The integer value of the end page (including the first of three digits, which may or may not be present in the lexical representation) must be greater than the value of the start page. In other words, in 106-09 the 09 must be recognized as greater than 106, etc.

Is it even possible to express these constraints with a regex?

If it is impossible, or possible but wrong-headed, here's an alternative: Should I have the full start and pages entered in different elements, with the lexical space constrained as well as possible with a pattern facet (1-3 digits, no leading zero) and then 1) use schematron to verify that the integer value of the end page is greater than that of the start page and 2) use xpath/xslt to format the lexical representation of the end page? I don't really do anything with these values except print them, so having the user enter what I want to print seemed more direct, but once I began to think about constraining the values, that approach began to look unappealingly (and perhaps even impossibly) complicated.

In case that wasn't bad enough, here are three further complications:

1. I know the number of pages in the books in question and would like to specify maximum values, so that users couldn't try to enter a range like 456-98 for a 300-page book.

2. One set of entries consists of a three-volume series, so the page ranges are actually something like "II: 123-34", meaning "volume 2, pp 123-234". Each volume begins numbering the pages at 1 and I know the last page number for each volume. If I'm going to constrain the maximum page value, I'd like to do that in a way that is sensitive to the different lengths of each volume.

3. The three-volume series is a numbered set of texts, where, say volume 1 contains texts 1-84, volume 2 contains texts 85-147, and volume 3 contains texts 148-210. The xml contains the text number as well as the page range, and I'd like to constrain the page values to be credible. That is, I don't want a user to be able to try to assign pages for text #99 to volume 3 because I know that that text is in volume 2. (Or should I handle this by not having the user enter a volume number, and just inferring that myself from the text number?)

Thanks,

David
djbpitt@xxxxxxxxx



-- ====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================

Current Thread