Re: [xsl] Inverting names with Jr and Sr considered

Subject: Re: [xsl] Inverting names with Jr and Sr considered
From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx>
Date: Tue, 6 Nov 2012 10:57:57 +0100
Hopefully you won't have names like "Augustus De Morgan", which should
not be transformed to "Morgan, Augustus De".

And I think this is the time and the place to quote this article once more:
http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-name
s/

-W

On 06/11/2012, Mark <mark@xxxxxxxxxxxx> wrote:
> I agree, my specification is likely not complete. However, my input is a
> single document written by one person indexing a single journal. There is a
>
> great deal of consistency to the data and I doubt that there are as many as
>
> 1000 names. That said:
>
> I received an answer off the list (thus do not feel authorized to post it
> here) that will help me discover what oddities I have not covered. It
> explained the regex expressions it used so that perhaps if modification is
> required, I may be able to do it.
>
> Thanks for your time, Michael; as always this list provides the most
> consistent and practical advice around, something you all can be proud of.
>
> Mark
>
> -----Original Message-----
> From: Michael Kay
> Sent: Tuesday, November 06, 2012 2:10 AM
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: [xsl] Inverting names with Jr and Sr considered
>
> I wouldn't even attempt to write any code based on this as the
> specification. For this to work at all well, you're going to need to
> iteratively adapt the solution to handle all the names in your dataset,
> or at least a sample of a couple of thousand of them. There's just too
> much variation in the names you might encounter. Are "Jr" and "Sr"
> really the only suffixes, and are they always spelt this way, or do you
> also get "III" and "Jnr" and "Jnr."?
>
> If I'm wrong, and the names are all regular and in the pattern you
> describe, then I think you can just tokenize on whitespace and do
> something like
>
> suffix := $tokens[last()][. = ('Jr', 'Sr')]
> stem := if ($suffix) then remove($tokens, count($tokens)) else $tokens
> value-of select="concat($stem[last()], ',']), remove($stem,
> count($stem), if ($suffix) then concat('(', $suffix, ')') else '')"
>
> Michael Kay
> Saxonica
>
> On 05/11/2012 23:45, Mark wrote:
>> This must have been done many times, so can some one show me where to find
>>
>> the answer?
>>
>> I have a series of personal names in natural order that I need to invert.
>>
>> The surname is always last except when followed by Jr, or Sr (either
>> of which may not be present). I want to represent:
>>
>> J Allen Rogers > Rogers, J Allen
>> Bill T Wilson Jr > Wilson, Bill T (Jr)
>> A B Brown > Brown, A B
>> John Victor Case Sr > Case, John Victor (Sr)
>>
>> and so on. There may be a single space or multiple spaces between some the
>>
>> elements of the name.
>>
>> It looks like <xsl:analyze-string> will do this, but I do not know how to
>>
>> write regex.
>>
>> Thanks,
>> Mark

Current Thread