Subject: Re: [xsl] Data science, data analytics using XSLT streaming|
From: Ihe Onwuka <ihe.onwuka@xxxxxxxxx>
Date: Tue, 5 Nov 2013 10:41:35 +0000
On Tue, Nov 5, 2013 at 10:12 AM, Costello, Roger L. <costello@xxxxxxxxx> wrote: > Hi Folks, > > Apparently "data science" is the hot buzzword these days: > > Data Scientist: The Sexiest Job of the 21st Century (http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/) > > I think that, in a nutshell, data science is about analyzing large amounts of data. > No it's not. The data don't necessarily have to be large. Shorn of that prequisite almost any form of computation entails analyzing data. > It seems that most people believe that the Hadoop, parallel processing paradigm is the sole way of doing data science/data analytics. > No they don't. First up Hadoop is not the paradigm it MapReduce is. Hadoop is just an open source project that implements the paradigm. > However, I think that streaming is an equally valuable approach. > > XSLT streaming is all about processing large amounts of (XML-formatted) data. > But just because XSLT just got it doesn't mean it is new. > So XSLT streaming should fit in the "data science" and "data analytics" categories. > If the source data is in XML then it is useful for extracting data and handing it off to an environment properly equipped with primitives for requisite statistical analysis. > Broad Question: Would you provide a scenario/example of doing data science/data analytics using XSLT streaming please? > > I realize that the question is rather vague and broad. I am hoping we can collectively come up with ideas on how to do data analytics (data science) using XSLT streaming. Any ideas you might have would be appreciated. > See the previous answer.