Subject: Re: [xsl] What is a better word for "de-duplication"?|
From: Andrew Franz <afranz0@xxxxxxxxxxxxxxxx>
Date: Tue, 29 Aug 2006 08:12:40 +1000
At 03:33 PM 8/28/2006, Andrew wrote:
Wendell Piez wrote:
At 08:41 PM 8/27/2006, you wrote:
I want to use a single, short word to express the act of removing duplicates from a node-set. I remember seing the word "de-duplication" used, however it sounds ugly.
Normalization (or 'normalisation' for those who prefer British orthography) would rather be the general process of transforming a set of values into their normalized forms. So,
<date value="2006">May Day 2006</date> <date value="2006-05-01"/> <date value="5-1-2006">May 1 2006</date>
might be normalized as
<date value="2006-05-01">May 1 2006</date> <date value="2006-05-01">May 1 2006</date> <date value="2006-05-01">May 1 2006</date>
but this would not deduplicate them.
These are very different problems, especially for XSLT. Generally speaking, deduplicating requires normalization first since deduplication works only over canonical forms (or comparing them to see which are duplicates becomes very difficult).