Subject: Re: [xsl] citation processing From: "Andrew Welch" <andrew.j.welch@xxxxxxxxx> Date: Fri, 20 Oct 2006 17:07:54 +0100 |
> If you think its not really feasible to parse a plain text citation > into a marked up version then that's good feedback
well never say never of course and it depends if the input always follows the rules.
You could parse the example you gave with a couple (or half a dozen:-) incantations with xsl:analyze-string,
get up to the first "." as list of authors recursively spit that up on , to get each author, etc
trouble is if the citations have been entered by hand some of them are going to use a : where you expect a , or a . instead of ;, or microsoft code page "smart quote" characters instead of the real thing. and a simple regexp replace mechanism isn't usually very good at recovering from variable input like that.
If on the other hand the text was originally in a citation system and was generated but has been cut and pasted across a few generations of html pages and the original source is no longer available, you mightbe OK to assume that the text itself is regular in its use of punctuation and ([^\.]*).([^\.]*).([^;]*);(.*) will for example give you the author list, the title, the journal title, and volumepage info as $1 ..$4 each of which could be further split up.
Yes that's the plan - I've been told there are four variations (so far) on format, so test each string against each variation and mark it up if one matches - any that drop out the bottom can be done by hand (by someone else!) or if that list gets too large try and infer some more rules. If they are based on based on a known format then it should be simple enough to work backwards.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] citation processing, David Carlisle | Thread | Re: [xsl] citation processing, Wendell Piez |
RE: [xsl] how to keep big integer f, Lin, Jessica | Date | RE: [xsl] citation processing, Waters, Michael, Spr |
Month |