For the last few months I've been working on a new project at Mark
Logic: a web site for interacting with email archives. It makes heavy
use of XML and XQuery. Each email is stored internally as an XML
document, and all searches, faceted navigation, analytic calculations,
and HTML page renderings are performed in XQuery on a single MarkLogic
Server machine.
We launched the site about three weeks ago. For launch we loaded all
the public emails from the Apache Software Foundation. That's about 500
lists and 4,000,000 messages. We've now started adding other lists, and
we loaded xsl-list this week:
http://xsl-list.markmail.org
We also loaded a few others:
http://xml-dev.markmail.org
http://x-query.markmail.org
http://css-d.markmail.org
You can search across all 4.5M emails via the home page:
http://markmail.org
As you'll see with the chart, one of our goals with the site has been to
focus heavily on analytics. We have lots of graphs and counts. Every
query you write gets its own histogram chart.
http://xsl-list.markmail.org/search/?q=%22extreme+markup%22
Another goal has been interactivity. Every search result screen gives
you lots of ways to refine your search (by sender, list, attachment
type, etc). Plus we did a lot with keyboard shortcuts. You can hit "n"
and "p" to move to the next and previous result and "j" and "k" to move
up and down the thread view. There's a lot of little things like this.
Plus if your result message includes Office or PDF files they're in-line
interactive too.
http://markmail.org/search/ext:ppt+xml
(Click on the attachment name to view it without leaving your browser.)
The subdomain you use implicitly limits the messages you search. Thus
http://xsl-list.markmail.org searches only lists with "xsl-list" in
their name (a single list).
You can search all Apache lists at http://apache.markmail.org, all
Apache Axis lists at http://axis.markmail.org, or across all lists at
http://markmail.org. You can always limit your search view using
"list:axis" in your query, but using the domain handles that a bit more
elegantly.
I hope you all find this useful!
Notes on using the site:
* Search using keywords as well as from:, subject:, extension:, and
list: constraints
* The GUI doesn't yet expose it, but you can negate any search item,
like -subject:soap
* You can sort results by date by adding order:date-forward or
order:date-backward to your query
* Remember to use "n" and "p" keyboard shortcuts to navigate the search
results
* You're going to want JavaScript enabled
-jh-