Subject: RE: [xsl] Using analyze-string to catch roman numerals? From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Thu, 9 Oct 2008 23:05:57 +0100 |
The two things wrong with your solution are: (a) you're matching any sequence of letters that could be a roman numeral, without looking at the context, hence matching the IX in APPENDIX. (b) you're only matching the first thing in each element that looks like a roman numeral The second is easily fixed: don't use an anchored regex in analyze-string like this regex="^(.*?)([IVXL]+)(.*?)$" Instead use an unanchored regex regex="([IVXL]+)" and add an xsl:non-matching-substring element that copies unmatched substrings across unchanged (or case-converted if you want). Problem (a) is much harder. You can get a fair way by requiring the sequence of IVXL to have non-letters before and after it. But you'll still be matching the word "ILL" as a roman numeral when it clearly isn't. Like all up-conversion tasks, though, it's very much up to you how much time you want to spend fine-tuning the patterns and rules that you define. Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: Tony Zanella [mailto:tony.zanella@xxxxxxxxx] > Sent: 09 October 2008 20:18 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: [xsl] Using analyze-string to catch roman numerals? > > Hello all, > > Given the following input: > > <root> > <head>CHAPTER II. THE WRECKED FOUNDATIONS OF DOMESTICITY</head> > <head>PROBLEMA. HELOISE XXIX.</head> > <head>Selected Letters</head> > <head>The Second Part of Henry IV.</head> > <head>VIII</head> > <head>APPENDIX VII</head> > <head>Appendix VII</head> > <head>APPENDIX</head> > <head>CALVIN XVII</head> > <head>ILLUSTRATION</head> > </root> > > and the following template: > > <xsl:template match="head"> > <xsl:choose> > <xsl:when test="not(matches(.,'^(.*?)([IVXL]+)(.*?)$'))"> > <xsl:value-of select="lower-case(.)"/> > </xsl:when> > <xsl:when test="matches(.,'^(.*?)([IVXL]+)(.*?)$')"> > <xsl:analyze-string select="." > regex="^(.*?)([IVXL]+)(.*?)$"> > <xsl:matching-substring> > <xsl:value-of > select="lower-case(regex-group(1))"/> > <xsl:value-of > select="upper-case(regex-group(2))"/> > <xsl:value-of > select="lower-case(regex-group(3))"/> > </xsl:matching-substring> > </xsl:analyze-string> > </xsl:when> > <xsl:otherwise/> > </xsl:choose> > </xsl:template> > > I'm trying to use analyze-string to do the following: > Test for a roman numeral. If there isn't one, lower-case(.). > If there is one, break (.) into its roman numeral and > non-roman numeral parts, lower-case()ing the latter. > > The output I get is: > > chapter II. the wrecked foundations of domesticity > probLema. heloise xxix. > selected Letters > the second part of henry IV. > VIII > appendIX vii > appendix VII > appendIX > caLVIn xvii > ILLustration > > When what I want is this: > > chapter II. the wrecked foundations of domesticity > problema. heloise XXIX. > selected letters > the second part of henry IV. > VIII > appendix VII > appendix VII > appendix > calvin XVII > illustration > > Between my relative inexperience with both regexes and XSLT, > thanks for any help! > Tony
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Using analyze-string to c, G. Ken Holman | Thread | [xsl] Parameters into variables, Joe Barwell |
Re: [xsl] Using analyze-string to c, G. Ken Holman | Date | Re: [xsl] Usage of XSLT in the fiel, J. S. Rawat |
Month |