4 Hot Open Source Big Data Projects

Re-posting an interesting article from April 2013

April 30, 2013 By Paul Rubens

There’s more — much more — to the Big Data software ecosystem than Hadoop. Here are four open source projects that will help you get big benefits from Big Data.

It’s difficult to talk about Big Data processing without mentioning Apache Hadoop, the open source Big Data software platform. But Hadoop is only part of the Big Data software ecosystem. There are many other open source software projects that are emerging to help you get more from Big Data.

Here are a few interesting ones that are worth keeping an eye on.


Spark bills itself as providing “lightning-fast cluster computing” that makes data analytics fast to run and fast to write. It’s being developed at UC Berkeley AMPLab and is free to download and use under the open source BSD license.

So what does it do? Essentially it’s an extremely fast cluster computing system that can run data in memory.  It was designed for two applications where keeping data in memory is an advantage: running iterative machine learning algorithms, and interactive data mining.

It’s claimed that Spark can run up to 100 times faster than Hadoop MapReduce in these environments. Spark can access any data source that Hadoop can access, so you can run it on any existing data sets that you have already set up for a Hadoop environment.

How is Big Data Shaping Enterprise IT Hiring?

There are nearly 600,000 big data jobs out there, creating a “hyper growth niche” career field that is ramping up demand for analytics officials more than all else, according to IT career resources website icrunchdata.

With numerous estimates on big data-related hiring at approximately 4 million new jobs by 2015, half of which would be in the U.S., icrunchdata published what it’s calling the first industry index specifically attuned to big data jobs. Under six categories connected to big data hiring, icrunchdata estimated there to be 598,510 jobs, according to information the firm has pulled together from various job posting sites and divided using a proprietary algorithm that’s scrubbed and deduplicated nightly.

Dwarfing the rest of the big data job categories were careers under “analytics,” which icrunchdata put at 220,767 of the total big data jobs. Each sector isn’t broken down by specific roles, though icrunchdata spokesman Todd Nevins says the analytics category “leads the way” in that it touches jobs directly related to all forms of data.

As the second largest segment, icrunchdata estimated 127,329 available “big data jobs,” or those specifically stated to deal with the demands of data volume and variety. Data scientist and similar career titles represented 82,444 of the big data jobs and software development took up more than 78,000. Filling out the fields were “statistics jobs” (60,430) and BI jobs (28,900).

