MySQL and Hadoop integration

Alexander Rubin, mysql and hadoop

July 11, 2013 By Alexander Rubin

Dolphin and Elephant: an Introduction

This post is intended for MySQL DBAs or Sysadmins who need to start usingApache Hadoop and want to integrate those 2 solutions. In this post I will cover some basic information about the Hadoop, focusing on Hive as well as MySQL and Hadoop/Hive integration.

First of all, if you were dealing with MySQL or any other relational database most of your professional life (like I was), Hadoop may look different. Very different. Apparently, Hadoop is the opposite to any relational database. Unlike the database where we have a set of tables and indexes, Hadoop works with a set of text files. And… there are no indexes at all. And yes, this may be shocking, but all scans are sequential (full “table” scans in MySQL terms).

So, when does Hadoop makes sense?

Big Data Jobs

Information Management News

How is Big Data Shaping Enterprise IT Hiring?

For an infographic breaking down these big data hiring projections, click here.

There are nearly 600,000 big data jobs out there, creating a “hyper growth niche” career field that is ramping up demand for analytics officials more than all else, according to IT career resources website icrunchdata.

With numerous estimates on big data-related hiring at approximately 4 million new jobs by 2015, half of which would be in the U.S., icrunchdata published what it’s calling the first industry index specifically attuned to big data jobs. Under six categories connected to big data hiring, icrunchdata estimated there to be 598,510 jobs, according to information the firm has pulled together from various job posting sites and divided using a proprietary algorithm that’s scrubbed and deduplicated nightly.

Dwarfing the rest of the big data job categories were careers under “analytics,” which icrunchdata put at 220,767 of the total big data jobs. Each sector isn’t broken down by specific roles, though icrunchdata spokesman Todd Nevins says the analytics category “leads the way” in that it touches jobs directly related to all forms of data.

As the second largest segment, icrunchdata estimated 127,329 available “big data jobs,” or those specifically stated to deal with the demands of data volume and variety. Data scientist and similar career titles represented 82,444 of the big data jobs and software development took up more than 78,000. Filling out the fields were “statistics jobs” (60,430) and BI jobs (28,900).

