|
Subject: RE: Re: [xsl] What is a better word for "de-duplication"? From: cknell@xxxxxxxxxx Date: Mon, 28 Aug 2006 19:26:02 -0400 |
All sorts of terms with ambiguous or impenetrable meanings don't help. They muddy the water. A tool need not be pretty to be useful. Is there any doubt about the meaning of "de-duplication"? Not from where I sit.
--
Charles Knell
cknell@xxxxxxxxxx - email
-----Original Message-----
From: Andrew Franz <afranz0@xxxxxxxxxxxxxxxx>
Sent: Tue, 29 Aug 2006 08:12:40 +1000
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [xsl] What is a better word for "de-duplication"?
Wendell Piez wrote:
> At 03:33 PM 8/28/2006, Andrew wrote:
>
>> Wendell Piez wrote:
>>
>>> Dear Dimitre,
>>>
>>> At 08:41 PM 8/27/2006, you wrote:
>>>
>>>> I want to use a single, short word to express the act of removing
>>>> duplicates from a node-set. I remember seing the word "de-duplication"
>>>> used, however it sounds ugly.
>>>
>>>
>> Normalisation
>
>
> Normalization (or 'normalisation' for those who prefer British
> orthography) would rather be the general process of transforming a set
> of values into their normalized forms. So,
>
> <date value="2006">May Day 2006</date>
> <date value="2006-05-01"/>
> <date value="5-1-2006">May 1 2006</date>
>
> might be normalized as
>
> <date value="2006-05-01">May 1 2006</date>
> <date value="2006-05-01">May 1 2006</date>
> <date value="2006-05-01">May 1 2006</date>
>
> but this would not deduplicate them.
>
> These are very different problems, especially for XSLT. Generally
> speaking, deduplicating requires normalization first since
> deduplication works only over canonical forms (or comparing them to
> see which are duplicates becomes very difficult).
>
> Cheers,
> Wendell
Yes, this is one meaning of 'normalisation'. But 'normalisation' is
richer and deeper than that. Think about relational database theory.
//2NF = / A relation is in 2NF if it is in 1NF and every non-key
attribute is fully dependent on each candidate key of the relation
In the above example:
/ <date value="2006">May Day 2006</date>
<date value="2006-05-01"/>
<date value="5-1-2006">May 1 2006</date>
becomes:
<standardDate id="x" year="2006" month="5" day="1" />
plus:
<date id="x" format="t yyyy">May Day</date>
<date id="x" format="yyyy-mm-dd" />
<date id="x" format="Mmm dd yyyy" />
I submit that these are *not* the same. In your example, you simply
removed the 'inconvenient' differences.
In the database normalisation, the commonalities are "normalised" or
"factored" out as a basis for comparison.
In this process (applied to XSLT perhaps), <date> has been
"de-duplicated" into <standardDate> but there is no loss of information.
Why invent new terminology?
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] What is a better word for, Andrew Franz | Thread | RE: Re: [xsl] What is a better word, sterling |
| Re: [xsl] What is a better word for, Robert Koberg | Date | Re: [xsl] What is a better word for, Dimitre Novatchev |
| Month |