Subject: Analysis of Usage Patterns (was Re: In The News) From: Richard Trott <richard.trott@xxxxxxxxxxxxxxxx> Date: Wed, 26 Jun 2002 14:34:44 -0700 (PDT) |
On 26 Jun 2002 digital-copyright-digest-help@xxxxxxxxxxxxxx wrote: > Evaluation of Digital Library Impact and User Communities by Analysis of > Usage Patterns > By Johan Bollen and Rick Luce, D-Lib Magazine, June 2002 Volume 8 Number > 6 > ISSN 1082-9873 > http://www.dlib.org/dlib/june02/bollen/06bollen.html > > "At present, digital library (DL) policy is largely informed by > management intuition and coarse measures of user satisfaction. Most DLs, > however, maintain extensive server logs of user retrieval requests that > contain a wealth of information on user preferences and the structure of > user retrieval patterns. We propose a quantitative approach to DL > evaluation that analyzes the retrieval habits of users to assess the > impact of a collection of documents and to determine the structure of a > given DL user community. We discuss a system that we have developed to > automatically generate extensive journal and document networks from an > efficient and simple analysis of user retrieval sequences registered in > a particular DL's server logs." > ------------ Did anyone else read this and find the central assumption problematic? Specifically, it is declared that "when a user retrieves two documents within a short period of time, it adds support to the claim that some level of similarity exists between these documents." No evidence is given for this statement. It is offered simply as common sense, and the rest of the paper appears dependent upon this premise. Personally, I don't believe the premise to be true. Let's say one is looking for information on Charles Mackay's 1859 writings about those whom he termed "the slow poisoners". One might (for example) fire up www.altavista.com and enter the words "slow poisoners" (without the quotatioin marks). The top site returned is www.slowpoisoners.com which sounds very promising indeed. The user follows that link. There, they discover that the Web site is for a San Francisco band called the Slow Poisoners. The Web page they retrieve contains no information whatsoever about Charles Mackay nor his writings on the subject. The user hits the back button, getting a cached version of the altavista search results. They go to the next site on the list, which is www.bootlegbooks.com and the link is to the text of chapter 11 of Mackay's 1859 book. Chapter 11 is entitled "The Slow Poisoners" and is all about exactly what the user is looking for. Bingo. If the user is using a proxy server, then the logs will show a visit to altavista, followed quickly by a visit to slowpoisoners.com, followed quickly by a visit to the specific information at bootlegbooks.com. Analysis using the techniques described in the paper will result in a similarity between slowpoisoners.com and the page at bootlegbooks.com being falsely ascribed. On the other hand, if the top pages returned were two links that were in fact similar and of interest to the reader, then the user might very well spend a lot of time at the first link before moving on to the second link. The techniques described in the paper would result in a false conclusion that the two links are not as similar as slowpoisoners.com and the bootlegbooks.com page. Basically, the technique assumes every document retrieval is significant. However, in many situations, if retrieving documents is simple (which it hopefully is), most document retrievals will not be significant. The user might quickly skim through a half dozen or more documents that are not what they are looking for before finding the one that is. In this situation, the relationship weight ascribed by the techniques in the paper will tell you much more about your document search engine and your user's savvy with using that search engine than about the documents itself. Or it might not. It's difficult (impossible?) to know. It also might (or might not) tell you more about the hyperlink structure at your site than about the relationships of document content. Again, it is difficult (perhaps impossible) to know. If you have a set of documents that are retrieved solely by a method that allows the user to request, "the paper entitled 'FooBar' from the _Journal of FooBarOlogy_ volume 3 number 4, by Smith, Trott, and Wesson," then the system described in the paper may work exceptionally well. However, broader application is questionable in my view. Or am I being naive and missing something crucial? Rich
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
help please, Permissions | Thread | Re: Analysis of Usage Patterns (was, Johan Bollen |
In The News, Olga Francois | Date | Re: Analysis of Usage Patterns (was, Johan Bollen |
Month |