Subject: Re: [xsl] XSLT function for title capitalization? From: "Liam R. E. Quin liam@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Tue, 10 Apr 2018 06:19:32 -0000 |
On Mon, 2018-04-09 at 20:52 +0000, David Sewell dsewell@xxxxxxxxxxxx wrote: > Wondering if anyone has a serviceable function (preferably in XSLT > 2/3 but v1 is > fine if it works) that takes a string as input and returns it with > title > capitalization according to English-language editorial practice (for > example, > Chicago Manual of Style). I'd use replace() probably, rather than tokenizing, so as to change as little as possible & facilitate regression tests. Some test cases should include * words that do and don't change at the start and at the end of input; * words like o'clock and don't that include apostrophes, both as ' and as b (it doesn't matter whether they are input as entities or literally or numeric character references though, as they all end up the same after XML parsing) * hyphenated proper names like Rees-Mogg * exceptions like Ladies-in-Waiting * punctuation such as em dashes, quotes, commas, semicolons Unfortunately XSLT doesn't give us Perl's wonderful e modifier on substitution, and neither does XQuery (where it'd be more useful), but XSLT does give us xsl:analyze-string. I'd start with David Carlisle's approach and add a lot of test cases and fix the regexp to be something more like (\w)(\w*(?:'\w+)?) maybe. An alternative is to replace (\w)'(\w) with $1E$2 everywhere, where E is some Unicode upper-case letter or sequence of letters that definitely doesn't occur in your input, and change it back at the end. In XSLT 1 i'd cry for a while and then write something recursive that split its input using translate() and substring-before() to find where to split. For https://words.fromoldbooks.org/Chalmers-Biography/ i use Perl, as the input isn't well-formed XML at first, with a table of manual overrides, but there are fewer than 10,000 entries i think. Once it's in XMl my script/Makefile for conversion does use XSLT, taking 46 seconds to process 43MBytes of XML into 9771 separate XML files with Saxon. Liam -- Liam Quin, W3C, http://www.w3.org/People/Quin/ Staff contact for Verifiable Claims WG, SVG WG, XQuery WG Improving Web Advertising: https://www.w3.org/community/web-adv/ Personal: awesome vintage art: http://www.fromoldbooks.org/
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] XSLT function for title c, Mukul Gandhi gandhi. | Thread | Re: [xsl] XSLT function for title c, Flanders, Charles E |
Re: [xsl] XSLT function for title c, Mukul Gandhi gandhi. | Date | Re: [xsl] Match to a condition in t, Michael Kay mike@xxx |
Month |