Re-posting an interesting article from April 2013
April 30, 2013 By Paul Rubens
There’s more — much more — to the Big Data software ecosystem than Hadoop. Here are four open source projects that will help you get big benefits from Big Data.
It’s difficult to talk about Big Data processing without mentioning Apache Hadoop, the open source Big Data software platform. But Hadoop is only part of the Big Data software ecosystem. There are many other open source software projects that are emerging to help you get more from Big Data.
Here are a few interesting ones that are worth keeping an eye on.
Spark bills itself as providing “lightning-fast cluster computing” that makes data analytics fast to run and fast to write. It’s being developed at UC Berkeley AMPLab and is free to download and use under the open source BSD license.
So what does it do? Essentially it’s an extremely fast cluster computing system that can run data in memory. It was designed for two applications where keeping data in memory is an advantage: running iterative machine learning algorithms, and interactive data mining.
It’s claimed that Spark can run up to 100 times faster than Hadoop MapReduce in these environments. Spark can access any data source that Hadoop can access, so you can run it on any existing data sets that you have already set up for a Hadoop environment.