4 Hot Open Source Big Data Projects

Enterprise Apps Today: CRM, ERP, and Business Intelligence Software News, Research, and Reviews

Re-posting an interesting article from April 2013

April 30, 2013 By Paul Rubens

There’s more — much more — to the Big Data software ecosystem than Hadoop. Here are four open source projects that will help you get big benefits from Big Data.

It’s difficult to talk about Big Data processing without mentioning Apache Hadoop, the open source Big Data software platform. But Hadoop is only part of the Big Data software ecosystem. There are many other open source software projects that are emerging to help you get more from Big Data.

Here are a few interesting ones that are worth keeping an eye on.


Spark bills itself as providing “lightning-fast cluster computing” that makes data analytics fast to run and fast to write. It’s being developed at UC Berkeley AMPLab and is free to download and use under the open source BSD license.

So what does it do? Essentially it’s an extremely fast cluster computing system that can run data in memory.  It was designed for two applications where keeping data in memory is an advantage: running iterative machine learning algorithms, and interactive data mining.

It’s claimed that Spark can run up to 100 times faster than Hadoop MapReduce in these environments. Spark can access any data source that Hadoop can access, so you can run it on any existing data sets that you have already set up for a Hadoop environment.

Full Story


Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s