GDPR – Where to start?

GDPR – Where to start?Financial Services institutions are assessing where to start on the road to GDPR (General Data Protection Regulation) compliance. Here are some thoughts about the starting point.

In a recent blog article, I set out my view on the journey to addressing GDPR compliance in Financial Services. In this blog, I’m going to examine my view on the start point for this journey as it’s becoming clear that whilst the whole GDPR subject is taxing, working out where to start is equally as challenging.

Starting GDPR is easy, just find all the customer data!

If only it were that easy! Financial Services institutions have been embarking upon Customer Centricity and Digital Transformation program and discovering that just understanding where all ‘relevant’ Customer data is stored isn’t that straightforward.

Typically, Customer data is stored in multiple siloed systems, in different formats, with differing levels of quality using different definitions and data conventions. Customer Centricity programmed have been undertaking the task of finding all the ‘relevant’ data to determine how it plays a role in the needs of the program.

One of the challenges with this approach is that the program is looking for data that is ‘relevant’ to Customer Centricity i.e. data that helps build a more complete picture of the Customer’s journey to help drive better service delivery and create up-sell /cross-sell opportunities.

GDPR will require all this information, as well as potentially everything else, stored about Customers

The right of consent, the ability to provide all information about a customer upon request, as well as the right to be forgotten, are all GDPR requirements that require knowledge of the locations of ALL relevant Customer data. This challenge gets extended further by the need to understand what constitutes Customer relevant data for a Financial Services institution, which is not as easy as it sounds.

Relevant Customer data can be in many different formats, many different locations, many different types (structured or unstructured) and in some cases held in forms tricky to deal with (i.e. voice recordings or video). The sizeable potential impact of not being compliant means there is a real need to focus on understanding where relevant Customer data is held, right from the start.

Traditional data discovery techniques will only take an institution so far. The impact of not finding and integrating customer data for a typical Customer Centricity program results in potential lower service levels or smaller up-sell / cross-sell opportunities. It’s bad but not that bad. The impact of not being GDPR compliant is potentially huge.

So just finding all relevant Customer data isn’t, in fact, that easy.

Defining policy

One of the most important places to start this journey is in the definition of the policy.

GDPR has a set of policy attributes that need defining in a systematic way for both business definition (i.e. right of consent and what this actually means) as well as technical definition (name, address etc.).

This is a crucial step as this is the point at which the definition of the policy, and any rules that are used to enforce the policy, is documented in a manner which enables it to be enforceable. This step is also the place in which specified data attributes that support the policy are defined.

In a GDPR environment, this would typically include specific data attributes that either donate a location of relevant Customer data or areas where there may be potential data conflict. An example of data conflict could be an IP address. As a standalone data attribute this is physically related to a specific device address but of course IP addresses can change and at this stage there is no way to verify this isn’t an IP for a shared device used by both a specific customer and any other customers.

The right of consent in GDPR means institutions needs to be much more aware of what data they have and how it is stored and used in relation to the consent given. Where this gets challenging even further is where there may be potential conflict in the right of consent as well as conflict with other regulations. An example might be a husband and wife both being customers of the same institution and having provided different rights of consent meaning that cross-sell and up-sell analysis, based upon a derived household view, needs to take this into account before the data is used.

Potential conflict with other regulations is another area where policy definition needs to be very thought through. An example might be a customer can invoke the right to be forgotten yet the institution has a requirement to keep transaction data for many years to enable auditing.

Automating data discovery

Once the policy has been defined, any solution needs to provide the automated discovery of relevant Customer data across any number of databases, sources, big data and cloud data stores. It needs to employ flexible, high-performance and scalable scanning capabilities to uncover where potential relevant Customer resides. Many Financial Services institutions typically employ single pass data discovery processes as this is usually enough for most projects. Not with GDPR.

The needs of GDPR means any solution needs to provide automated data discovery on both a first-time and continual basis. The automation of the first pass of data discovery is used to find the locations of relevant Customer data. The need for doing this on a continual basis is to identify when relevant Customer data gets moved from a known source location to somewhere new. This is often the case where shadow IT solutions exist to solve specific business challenges which involve transferring relevant Customer data to new locations for specific, further analysis. The fact that this has happened would be missed unless continual monitoring is taking place. This type of continual vigilance will be key to staying GDPR compliant.

Identified potentially relevant Customer data will then need to be classified to determine how it fits into the defined policy and the constraints based upon the right of consent. Some data attributes will fit easily into the policy (i.e. customer name), some data attributes will be derived from relevant Customer data so need further examination (i.e. account number that includes data of birth) and some data attributes will require much further examination (i.e. IP address).

Classification helps define the priorities of potential data remediation based upon how the data attributes fit into, and confirm to, the policy definition. Classification needs to be driven from metadata definitions (either generated or created) as well as data / metadata patterns and rules.

Understanding data proliferation

As mentioned previously, data proliferation is a major challenge around relevant Customer data as it’s often extracted from source systems and copied to other systems for subsequent processing. These other systems often sit outside of any formal governance processes and visibility of the use of the data quickly reduces, often to zero.

In any Financial Service institution there is always a great of deal of information that gets taken from core systems and manipulated in other tool sets including Microsoft Excel. What this highlights is that to stay GDPR compliant, Financial Services institutions are going to need to be continually vigilant with what is happening with relevant Customer data especially once it leaves properly governed environments. Continual monitoring will need to become the norm to stop data proliferation being a major business risk.

The reason for this is that once the data leaves the control of any properly governed environment, there becomes a potential risk that any subsequent processing creates yet another source of relevant Customer data, albeit a source that few will probably know about. This is where the policy definition, automated discovery and classification all become important as together they will help bring clarity on whether this newly generated source really does contain relevant Customer data and if so, what is the risk associated with it is.

This highlights the need to understand the risk associated with Customer relevant data as, without this, there is little clarity on whether remediating action needs to be taken or not.

Risk Scoring

Data proliferation and GDPR now requires a risk score to be generated based upon the understanding and the movement of relevant Customer data. A risk score would be generated from a number of different attributes of data security including:

  • Data existence
  • Volume of data
  • Data protection availability
  • Data proliferation
  • Data accessibility

By taking all these attributes, and more, a risk score can be calculated. The score is simply a number although the reason this becomes powerful is that it enables Financial Services institutions to start to prioritise the sequence in which sources of relevant Customer data need addressing. A high score would denote a data source that potentially needs urgent attention whilst a low score could wait.


RAG approach to evaluating GDPR Risk scores


The simple example graphic above provides much needed clarity on where Financial Services institutions needs to prioritize their remediation activities, with the assumption that drill-down will be available to show the elements of how the score is calculated.

So the way the risk score gets calculated means institutions can not only see a score for prioritization but also understand how it’s calculated, bringing much needed understanding about the nature of the risk and how it might be tackled. As remediation takes place the high risk scores should start to drop, so tracking the risk score history highlights how much progress is actually being made over time.

‘Hope for the best and plan for the worst’

My experience so far, given that the GDPR regulation is still quite new and many are still trying to work out what it means, is that institutions are adopting a ‘hope for the best and plan for the worst’ approach.  Planning is quickly getting underway to try and get to grips with what GDPR means and how best to tackle it. With such short time frames, institutions are working hard at understanding their current state to determine what action plans are needed to meet the deadlines.

This is one area where good software tools can add value quickly. Informatica brought to market data security intelligence capabilities that help tackle most of the highlighted problem areas. Using the right software tools can help organisations get on top of the GDPR issue and enable them to put the right plans in place to address this most challenging Customer data regulation.


  • Pingback: 3 Factors Why 2017 will be a Milestone Year for Data()

  • Jean Burnay

    Hello Andrew. I appreciate your articles very much.
    At P&V insurance company (Belgium) we started a data management program that will sustain customer engagement, regulatory compliance and many other business goals.
    Today we have a reference conceptual information model (enterprise wide) that is the foundation to start data asset management, data governance and data quality. Currently the reference model is managed in our IT architecture tool.
    We feel that it is not the best place for several reasons you can easily imagine. We need a true collaboration in the community of data producers and data consumers.
    We need an information catalog and Informatica has one. What is your comment about linking GDPR and the Informatica information catalog? Would you have some documented and available use cases or any other related documentation?

  • Noor Basha Shaik

    Data discovery results are metadata. How can we link Secure@Source results with underlying data so we link the subject access request raised by a particular customer?