| By David Tishgart | Article Rating: |
|
| October 27, 2012 09:00 AM EDT | Reads: |
3,360 |
Big Data takes center stage today at the Strata Conference & Hadoop World in New York, the world’s largest gathering of the Apache Hadoop™ community. A key conversation topic will be how organizations can improve data security for Hadoop and the applications that run on the platform. As you know, Hadoop and similar data stores hold a lot of promise for organizations to finally gain some value out of the immense amount of data they're capturing. But HDFS, Hive and other nascent NoSQL technologies were not necessarily designed with comprehensive security in mind. Often what happens as big data projects grow is sensitive data like HIPAA data, PII and financial records get captured and stored. It's important this data remains secure at rest.

I polled my fellow co-workers at Gazzang last week, and asked them to come up with a top ten list for securing Apache Hadoop. Here's what they delivered. Enjoy:
Think about security before getting started – You don’t wait until after a burglary to put locks on your doors, and you should not wait until after a breach to secure your data. Make sure a serious data security discussion takes place before installing and feeding data into your Hadoop cluster.
Consider what data may get stored – If you are using Hadoop to store and run analytics against regulatory data, you likely need to comply with specific security requirements. If the stored data does not fall under regulatory jurisdiction, keep in mind the risks to your public reputation and potential loss of revenue if data such as personally identifiable information (PII) were breached.
Encrypt data at rest and in motion – Add transparent data encryption at the file layer as a first step toward enhancing the security of a big data project. SSL encryption can protect big data as it moves between nodes and applications.
As Securosis analyst Adrian Lane wrote in a recent blog, “File encryption addresses two attacker methods for circumventing normal application security controls. Encryption protects in case malicious users or administrators gain access to data nodes and directly inspect files, and it also renders stolen files or disk images unreadable. It is transparent to both Hadoop and calling applications and scales out as the cluster grows. This is a cost-effective way to address several data security threats.”
Store the keys away from the encrypted data – Storing encryption keys on the same server as the encrypted data is akin to locking your house and leaving the key in your front door. Instead, use a key management system that separates the key from the encrypted data.
Institute access controls – Establishing and enforcing policies that govern which people and processes can access data stored within Hadoop is essential for keeping rogue users and applications off your cluster.
Require multi-factor authentication - Multi-factor authentication can significantly reduce the likelihood of an account being compromised or access to Hadoop data being granted to an unauthorized party.
Use secure automation – Beyond data encryption, organizations should look to DevOps tools such as Chef or Puppet for automated patch and configuration management.
Frequently audit your environment – Project needs, data sets, cloud requirements and security risks are constantly changing. It’s important to make sure you are closely monitoring your Hadoop environment and performing frequent checks to ensure performance and security goals are being met.
Ask tough questions of your cloud provider – Be sure you know what your cloud provider is responsible for. Will they encrypt your data? Who will store and have access to your keys? How is your data retired when you no longer need it? How do they prevent data leakage?
Centralize accountability – Centralizing the accountability for data security ensures consistent policy enforcement and access control across diverse organizational silos and data sets.
Did we miss anything? If so, please comment below, and enjoy Strata +HadoopWorld.
Published October 27, 2012 Reads 3,360
Copyright © 2012 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By David Tishgart
After spending years at large corporations including Dell, AMD and BMC, David Tishgart joined the startup ranks leading product marketing for Gazzang. Focused on security for big data, he helps communicate the benefits and challenges that big data can present, offering practical solutions. When not ranting about encryption and key management, you can find David clamoring for a big data application that can fine tune his fantasy football team.
- Cloud People: A Who's Who of Cloud Computing
- New Relic Q1 2013 Blazes Past Growth Targets and Reaches 40,000 Active Customer Accounts
- Streamline Health® Engages KPMG as Its New Independent Registered Public Accountants
- Session Topics: 12th Cloud Expo / Cloud Expo New York
- Cloud Expo New York: Developing the World’s First IaaS Marketplace
- Cloud Expo New York: Aligning Your Cloud Security with the Business
- Commander of U.S. Cyber Command and National Security Agency Director, General Keith Alexander, To Keynote Day One of Black Hat USA 2013
- Five Big Data Features in SQL Server
- According to Nick Gholkar, Accounting Apps Make Conducting Business Easier
- NIST to Sponsor FFRDC Widespread Adoption of Integrated CyberSecurity
- Cloud Business Solutions, Social Media, and Platform Systems of Engagement Market Shares, Strategies, and Forecasts, Worldwide, 2013 to 2019
- Lunch Keynote at Cloud Expo | Strategies for App Delivery in the Cloud Era
- Cloud People: A Who's Who of Cloud Computing
- Windows Azure IaaS Reaches General Availability
- AMD and Adobe Collaborate on Upcoming Version of Adobe Premiere Pro Software to Enable Breakthrough Video Editing Performance Through Open Standards
- New Relic Q1 2013 Blazes Past Growth Targets and Reaches 40,000 Active Customer Accounts
- State and Local Governments Adopt Microsoft Dynamics CRM to Improve Citizen Service Delivery
- Cloud Expo New York: Deploying Hybrid Cloud for Performance and Uptime
- Predixion Software Announces General Availability of the Latest Version of its Predictive Analytics Platform
- Streamline Health® Engages KPMG as Its New Independent Registered Public Accountants
- Session Topics: 12th Cloud Expo / Cloud Expo New York
- MEI Pharma Announces $15.2 Million Registered Offering Of Common Stock
- Cloud Expo New York: Developing the World’s First IaaS Marketplace
- Cloud Computing Is Simplifying Things
- Google Maps and ASP.NET
- Converting VB6 to VB.NET, Part I
- How to Write High-Performance C# Code
- Where Are RIA Technologies Headed in 2008?
- Crystal Reports XI & How It Has Changed
- Creating Controls for.NET Compact Framework in Visual Studio 2005
- Programmatically Posting Data to ASP .NET Web Applications
- Implementing Tab Navigation with ASP.NET 2.0
- AJAX World RIA Conference & Expo Kicks Off in New York City
- i-Technology Viewpoint: "SOA Sucks"
- .NET Archives: Getting Reacquainted with the Father of C#
- i-Technology Photo Exclusive: Bill Gates & Steve Jobs In "Nerds"



















