Tuesday, 18 November 2014

Big Data: A big security challenge



By Debra Littlejohn Shinder

Big Data – the collection of large and complex sets of data that include both structure and unstructured information – is widely touted as one of the most important current trends in computing, along with Bring Your Own Device/mobility and of course, the cloud. In fact, the convergence of these technologies is seen by many as the top IT challenges of this decade. 

Much has been said and written about the security implications of BYOD, mobile devices and cloud services, but the security aspects of big data don’t seem to get quite as much attention. This is true even though companies are accumulating and analyzing huge amounts of information – not just terabytes, but petabytes – and some of it could cause big problems if it fell into the wrong hands. 

Image courtesy of Renjith Krishnan at FreeDigitalPhotos.net
After all, the real point of collecting such massive amounts of data is not just to be a data hoarder; the objective is to subject it to analytics that can provide the company’s decision-makers with insights into aspects of their business that can have an impact on the organization’s efficiency, reputation and bottom line. But we all know that information that can be used for good can also be used for nefarious purposes, and if those business insights became public and/or were revealed to competitors, the impact on the company could be very negative indeed.

The security challenge of big data is complicated by another of those hot trends we mentioned above; many companies don’t have the storage capacity on premises to handle the amounts of data involved, so they store all that data in the cloud. Some do so in the mistaken believe that turning their data over to a cloud storage provider means they also get to hand off all of the responsibility for securing that data. 

For some companies, this might even be a reason for the decision to store the data in the cloud in the first place. You could argue that large cloud providers have far more resources to put into securing the data than your organization does. Cloud data centers are heavily guarded fortresses that employ high dollar physical and technological security mechanisms. 

Image courtesy of Stuart Miles at FreeDigitalPhotos.net
This line of reasoning makes sense – but the cloud shouldn’t be an excuse to abdicate your ultimate responsibility for the protection of your sensitive information. If there is a breach, your customers will blame you, not the cloud provider, because you are the one to whom they entrusted their information. This does double if you’re doing business in a regulated industry – financial, healthcare, a publicly traded corporation, a retail business that processes payment cards, etc. You won’t be able to pass the buck if you’re found to be out of compliance or in violation of standards. 

As with information security in general, the key to securing big data is to take a multi-layered approach. One important element in protecting the huge quantity of data that often contains bits and pieces of personal information about many individuals is de-identification – the separation of identifying information from the rest of the information pertaining to a person. Unfortunately, the counterpart to de-identification is re-identification, the art and science of putting all those pieces back together to discern identities from the de-identified data. 

In a report last summer, Gartner concluded that over 80 percent of organizations don’t have a consolidated data security policy across silos, and that in order to prevent breaches, they need to take a more data-centric approach to security. 

Of course, many of the security concerns and solutions that apply to big data are the same ones that apply to protecting any sensitive data. However, one thing that makes big data especially challenging is that it often passes through many more different systems and applications in the process of turning all that unstructured mess into useful information. 

Companies may use applications and storage methods for which security was not a design priority, so that they have to tack on security solutions after the fact. Since much of big data is unstructured, it’s often stored in non-relational databases such as NoSQL, which were not built with security in mind. Traditional firewalls and other security solutions weren’t designed to handle distributed computing that is at the heart of big data. Automated moving of data between tiers in a multi-tiered storage system can make it difficult to keep track of where the data is physically located, which poses a security issue.

Close attention to “middleware” security mechanisms, extensive and accurate logging of data tracking, and real-time monitoring are essential components of a security strategy that encompasses the challenges of big data.

You can find more information about securing data in the cloud here.  

Author Profile

Debra Littlejohn Shinder, MCSE, MVP (Security) is a technology consultant, trainer and writer who has authored a number of books on computer operating systems, networking, and security.

She is also a tech editor, developmental editor and contributor to over 20 additional books. Her articles are regularly published on TechRepublic's TechProGuild Web site and WindowSecurity.com, and has appeared in print magazines such as Windows IT Pro (formerly Windows & .NET) Magazine.