Identity Management for Multi-Tiered Big Data Environments

Are you satisfied that your big data applications are protected against cyberthreats? Are you able to prove compliance across the big data stack? If not, read on!

Big Data environments usually have clusters of nodes, and each node has, at the bottom layer, an operating system on which the big data and applications sit on. For example — imagine an IBM BigInsights customer using the Big SQL application, based on a Hortonworks cluster running on Linux operating system. It can be challenging to figure out how to properly secure each layer of this stack!

To help understand this better, let’s walk through an example of a Centrify Enterprise customer that’s using Centrify for unified identity and access management, leveraging their existing Active Directory. A line of business told Sam, an IT Director at a large bank, that they were interested in using IBM Big SQL as their primary application and requested him to provision a IBM BigInsights Hadoop cluster.

Sam knows that he needs to secure access to the Hadoop Cluster using Kerberos in order to enforce Secure Mode. He wants to use Active Directory for user authentication since all users already authenticate to Active Directory. Searching online he found that Hadoop can use Apache Knox (User guide) and Ambari (Security guide) to control user login, but these will only provide LDAP based authentication which will not integrate with his multi-domain Active Directory. Centrify’s LDAP proxy (Admin Guide) provides the necessary LDAP interface that Knox and Ambari require in order to enable user authentication to any trusted Active Directory domain. By doing this, Sam is leveraging their existing Active Directory infrastructure for unified authentication and access across the layers and reducing risk of managing a parallel identity infrastructure.

The next challenge he has is, “How to protect PII data by managing user access rights?” He found that BigInsights supports role-based access for specific functions, and by using Centrify he can make use of his Active Directory Groups by making them members of these BigInsight Roles using their “PAM with LDAP” model.

Role-based access can also be applied at the Hadoop layer through the use of Ranger in order to define Roles with associated Rights and again assign AD users or groups via Centrify to these Roles.


Since many commands can be executed via CLI at the Operating System layer, Centrify Access Manager enables him to define Roles with associated access rights and privileges to enforce a Least-Privilege Access model. Sam is able to easily implement least-privilege across layers using role-based access control.

Sam’s auditors have already informed him that this BigInsight application will need to be compliant with both SOX and PCI-DSS since it will be processing financial transactions. In order to prove compliance, Sam must be able to produce audit logs across all layers of the environment in order to satisfy the security and audit teams. He can easily enable auditing on BigInsights following the BigInsights documentation. Now that he can produce the Audit logs for BigInsights, he does the same for Hadoop by enabling audit logging with Apache Ranger.

And, since Sam is using Centrify Server Suite Enterprise Edition, he’s able record all sessions at the OS level.


Sam is easily able to produce to his Auditors Audit logs across layers for improved accountability, forensics and compliance.            

In summary, IT and Big Data teams need to think about administration, authentication, authorization and audit at all layers of the Big Data stack – Application, Hadoop and Operating System – to protect the data against cyber threats. Centrify’s Privileged Identity Management solutions help secure and simplify complex Big Data environments for better access control, privilege management and user-level auditing.


