Re: [xsl] citation processing

Subject: Re: [xsl] citation processing
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Fri, 20 Oct 2006 12:37:16 -0400
At 11:32 AM 10/20/2006, Andrew wrote:
If you think its not really feasible to parse a plain text citation
into a marked up version then that's good feedback - it could well be
that a percentage need to be done by hand.

Scale is a real issue here. Real-world citation formats include variations like "use 'pp.' on page ranges for articles in books, but not for articles in journals." At scale, even if your process does the correct thing with 85 of 100 citations (a very optimistic rate), that can leave scores of incorrect ones. And if your upconversion can't recognize where it's failing, you have to find the errors before you can fix them.


David is right: it's ultimately an NLP problem (though a very interesting subset of NLP). As he also says, success depends both on handling the rules properly, and on the input actually following those rules. (There are dozens of citation formats around, too.) "Never say never" is good to keep in mind, but when I'm asked to look at citations I immediately start asking questions about the scope of the input, its validation, and acceptable strategies for exception handling. When told there won't be any exceptions it's usually pretty easy to find a bunch.

Cheers,
Wendell

Current Thread