Data Proliferation Exposes Higher Risk of a Data Breach
Data proliferation has traditionally been measured based on the number of copies data reside on different media. For example, if data residing on an enterprise storage device was backed up to tape, the proliferation was measured by the number of tapes the same piece of data would reside. Now that backups are no longer restricted to the data center and data is no longer constrained by the originating application, this definition is due for an update.
Data proliferation should be measured based on the number of users who have access to or can view the data and that data proliferation is a primary factor in measuring the risk of a data breach. My argument here is that as sensitive, confidential or private data proliferates beyond the original copy, it increases its surface area and proportionally increases its risk of a data breach.
Using the original definition of data proliferation and an example of data storage shown below, data proliferation would include production, production copies used for disaster recovery purposes and all physical backup copies. But as you can see, data is also copied to test environments for development purposes. When factoring in the number of privileged users with access to those copies, you have a different view of proliferation and potential risk.
In the example, there are potentially thousands of copies of sensitive data but only a small number of users who are authorized to access the data.
In the case of test and development, this image highlights a potentially high area of risk because the number of users who could see the sensitive data is high.
Similarly with online advertising, the measure of how many people see an online ad is called an impression. If an ad was seen by 100 online users, it would have 100 impressions.
When you apply that same principal to data security, you could say that data proliferation is a calculation of the number of copies of a data element multiplied by the potential number of users who could physically view the data, or in other words ‘impressions’. In this second image below, rather than considering the total number of copies, what if we measured risk based on the total number of impressions?
In this case, the measure of risk is independent of the physical media the data reside on. You could take this a few steps further and add a factor based on security controls in place to prevent unauthorized access.