Information Security in the Age of Big Data

The Age of Data

big data
Information Security in the Age of Big Data

The 21st Century is, without a doubt, the Age of Data. In fact, some would say, it is the Age of Big Data. Data is everywhere and everyone is working to capitalize on its power. Individuals, commercial organizations and governments seek to leverage the tremendous potential of data analytics. And, what is currently possible using today’s Big Data technology is truly just the “tip of the iceberg” of all that will become possible. As a result, perceptions of data are in a state of transformation. The view of what data is and how data is obtained in continually being altered. And these changes are impacting the way organizations view Information Security.

Of course, the quantity (Volume) of data has dramatically increased. But, in addition, data is coming at ever-increasing speed (Velocity) and in a large range of forms (Variety). One one hand, data owners need to cope with the increasing complexity that these factors introduce. However, on the other hand, data consumers have become increasingly concerned with trustworthiness of the data (Veracity). Finally, data consumers are focussed on maximizing the potential of the data (Value).

Volume is not only a question of storage. With the increasing quantity of data, the processing dramatically increase driving us to use new models of processing. Fortunately, academy research already has a bunch of algorithms to offer for parallel processing, distributed processing and diffusing processing. Yet, most of these are new to information technology professionals.

Variety is usually associated with the wider range of data types such as medias (images, videos, etc…) or Geo-locations but, variety is not limited to data types. The diversity of entities that organizations are dealing with has significantly increased as well as the diversity of the sources of data, causing data entities to take multiple forms. This property, called polymorphism, become an inherent property of data. On the other hand, the diversity of data consumers or more exactly the diversity of their requirements requires data to be retrieved in multiple forms implying transformation of the data. This introduce a notion of metamorphism. These properties of polymorphism and metamorphism are not new by themselves; What is new is our perception of these being inherent to the data and the disruption this cause on the structured approach we apprehend data.

Velocity is the speed at which data evolve. The most obvious evolution is from a volume perspective. The rate at which data accumulate is constantly increasing. However, velocity is not only a question of volume. Not only the volume in growing quickly but also the variety of data we need to manipulate is growing at increasing rate. This is the main reason why application and data analytics developers do not want to rely on a frozen data model anymore and will progressively rely on external data digestion and integration tools that introduce dynamism in the data model to cope with data polymorphism velocity.

These characteristics lead to the emergence of intelligent, well-rounded data layers allowing to manage data in all its forms independently from the applications and/or consumers, allowing to seamlessly handle polymorphism velocity as well as data transformation coming from metamorphic requirements.


The legislation is not indifferent to the data revolution. As data becomes involved in so many aspects of our lives, the laws adapt to the new reality and the use of data becomes more and more regulated to prevent abuses on one hand and, to leverage the accumulating data on the other hand. Regulation encompass many aspects of data ownerships, starting from privacy to anti-terrorism regulation and even tax related regulation.

As regulation proliferates, data owners face unprecedented challenges. The most obvious one is to keep up with upcoming regulation. Since regulation often means liability, organizations responsiveness to regulation becomes a critical factor. This also mean that, when in doubt, one will tend to over comply rather than under comply.

This in turn creates a totally new challenge; Privacy regulations usually creates a duty to protect the information while other regulations such as FATCA (Foreign Account Tax Compliance Act) or CRS (Common Reporting System) create a reporting duty. What happens when over complying to one regulation means under complying to another? For instance, would over reporting under FATCA become a breach of duty under a privacy regulation such as GDPR (General Data Protection Regulation)? The fast evolving regulatory environment will at mid-term change the way we deal with regulation as the railing approach should be abandoned in favor of a more precise approach.

Information Security

They are three well recognized information security requirements: Confidentiality, Integrity and Availability. To fulfill these requirements, two coexisting approach are traditionally deployed.

Network Centric Security regulates access to perimeters. A perimeter can be many things such as a computer, a data source or an application. In many ways, this is an all or nothing approach; You can either access a service or not.

Since network centric security tools cannot track the complexity of all the underlying information models, more granular control of access is the responsibility of the service used to access it. Application Centric Security is based on the idea that since the application knows the structure of the data, it is most suited to control granular access. This assumption is no longer true in an age where the data polymorphism is a critical success factor. This model also fails to cope with the new challenges brought by the quickly evolving regulatory environment.

A new paradigm is emerging, not to replace the older one but to augment their reach. Data Centric Security requires information security to be managed independently of data actual structure or its use. Data should be secure wherever it is, whether at rest or in motion, whether on premise or on cloud and not the least, whether inside a perimeter or outside.

For example, some sensitive data like personal identification information like a customer name can be stored as a field in a database or in a file containing a scanned form or in a cloud customer relationship management application or even in an email communication stored on an employee mobile phone. In any cases, privacy regulation may require to be secure this data.

This is a major paradigm shift as that already requires security expert to become data scientists and will soon require data scientists to become security experts as well. To protect the data at the right level of granularity, one need to have deep understanding of the data itself while the actual media where the data transit is less relevant. This by itself turns the security expertise upside down.

As we are going to see data layers abstracting the information we use, these are going to be the logical place to handle data centric security requirements. However, since we are still at the dawn of Data Centric Security, these requirements are yet to be fully defined. Gartner’s recently created DCAP category covers some aspects of these requirements while leaving some others in dark mainly because, despite its attempt to generalize the perception of access in all data silos, it is still trapped in a perimeter oriented approach.

As much as emerging data layers will deal with data in all its forms on any media, data centric security requires to secure data in all its forms and on all media. To cope with the fast evolving regulation as well as data polymorphism, Confidentiality requires data access and protection to be managed at the smallest granular level: the data element.

As processing models evolve to handle volumes, their inherent distributed characteristic should involve resilience and fulfill the Availability security requirement.

An interesting point is the expansion of the Integrity security requirement into a much wider trustworthiness. Veracity of data should indeed be certain to assure the Value to data consumers. Integrity become a specific case of data quality guarantee.


With the Integrity requirement expanding to a more general Veracity requirement, data quality becomes an integral part of information security.

The proliferating regulation address data confidentiality at the level of specific semantics in the data, forcing the consolidation of data governance and information security.

Finally, the polymorphic and metamorphic properties of the data cause information security to incorporate data integration expertise.

With the rise of intelligent data layers, information security is likely to become part of how data and data processing are defined and encompass the entire data life cycle. We can expect data integration, data quality, data centric security and data governance expertise to converge and become multiple facets of a single expertise.