Secure Identity for Hadoop @ Strata+Hadoop World 2015

The most interesting thing to me about Strata+Hadoop World was the stories about what everyone is doing with Hadoop or Big Data. We heard numerous stories about how data scientists are using Hadoop to analyze customer data, financial data, web site click traffic, etc. In fact, most of the people who came to the show were realizing the value of Hadoop technology, while very few were responsible for the IT infrastructure that it runs on (who we normally sell security solutions to). And the most common title at the show was Data Scientist, which got me thinking that we should have a Security Scientist working for us to help put Hadoop to work in the security market.

Centrify Northwest Sales: Morgan and Sean
Centrify Northwest Sales: Morgan and Sean @ Strata+Hadoop World (during a break)

At the show we validated that securing identities for Hadoop is extremely important to anyone moving their Hadoop deployments into production. In fact, we met with several people who were responsible for the Hadoop application within the business who didn’t know that their company was already a Centrify customer. We’ve seen this many times before where the Hadoop project was started on the development side of the organization long before IT was aware that they would need to help move it into production. But, I was reminded that even these development systems have to load in data that most likely came from the production environment. This means that they must also secure the development environment to the same extent that you’d secure a production Hadoop cluster, even if the development system is working from an older snapshot of real data.

Security is such a broad term that just saying Secure Hadoop can mean different things to different people. Probably the most common question we get is what is different about what we are providing from the security that the Hadoop distribution vendors are providing through tools such as Sentry, Knox or Ranger. Our answer is that Centrify provides the Active Directory-based identity for both users and groups that is needed on each node within the cluster to enable these Hadoop tools to securely identify the user for authorized access to data.

Centrify provides Operating System-level controls to grant authorized users with the ability to log in, where Centrify also automatically sets up the user’s Kerberos identity required for access to Hadoop running in secure mode. Additionally, Centrify provides privilege management on each node in the cluster so that IT staff can log in to perform their job duties such as start/stop/restart specific Hadoop services, or modify and fine-tune configuration files, but not allow them to access the data held within the cluster. In other words, you grant them specific rights to OS management commands and not to Hadoop commands. And since many of these clusters store data that is subject to regulatory compliance, Centrify provides full session auditing to ensure accountability and enable auditors to see exactly what someone did while logged into a node within the cluster. Session auditing provides video recording of the user’s actions on the nodes within the cluster vs. the cryptic events found within syslog.

While at the show we found a few other vendors providing security solutions for Hadoop, although these other vendors are focusing on other areas such as encryption or tokenization of data within Hadoop. Centrify is the first vendor to focus on Identity Management for Hadoop running in secure mode as a complimentary offering to the many other security solutions the Hadoop vendors are bringing to market.

To learn more, visit our solution page for Big Data ( where you will find a white paper, solution briefs for Cloudera, Hortonworks and MapR, or you can simply request a 30-day trial to access the Centrify Server Suite and get started today integrating your Hadoop cluster into Active Directory.