I was recently talking to an executive responsible for IT infrastructure of a mid-sized bank and he mentioned that he has 17 core applications critical for business, and, over 200 ancillary applications. Each core application historically has significant changes, once a quarter and 2 small changes every week. That is about 70 big changes and 1700 small changes every year!!! Just for the core applications that are critical to business.
How does the business ensure that these perturbations to the existing systems do not break existing capabilities within the system? Worse, how does a business ensure that changes to these systems do not break other downstream applications that rely on data that is being captured in these systems? How do testing teams pick up the right test data to test all possible permutations and combinations that that these changes introduce?
Test data management solutions solve these problems with by subsetting a small set of product data to be used for testing and masking that data so that sensitive data is not exposed to a large set of users. Test data generation allows for data to be augmented for features that are currently not in production as well as introducing bad data in the environment.
As the number of applications increase many fold and the development strategies move from waterfall model to agile and continuous integration, there is a need to not only to provision test data, but provision it in such a way that it can be automated and used repeatedly. That requires a warehouse of test data that is categorized and tagged by test cases, test areas. This test data warehouse should:
- Allow testers to review test data in the warehouse, tag interesting data and data sets and be able to search on them
- In case test data is missing, augment test data
- Visualize the test data, identifying white spaces in test data and generate test data to fill the white space.
Once this warehouse of test datasets are available, testers should be able to reserve test records that are being used by them for their testing. These records can be used to restore the test environments from the test data warehouse. Testers can then run their test cases without impacting other testers who are using the same test environment.
This still leaves the question open on how do organizations test a business process that spans multiple systems. How can organizations create test data in these systems that allow a business process to be executed from one end to another? Once organizations can master this capability, they will reach their potential in automating their test processes, and reduce risk that each perturbation in the environment inevitably bring.
In response to the growth, organizations seek new ways to unlock the value of their data. Traditionally, data has been analyzed for a few key reasons. First, data was analyzed in order to identify ways to improve operational efficiency. Secondly, data was analyzed to identify opportunities to increase revenue.
As data expands, companies have found new uses for these growing data sets. Of late, organizations have started providing data to partners, who then sell the ‘intelligence’ they glean from within the data. Consider a coffee shop owner whose store doesn’t open until 8 AM. This owner would be interested in learning how many target customers (Perhaps people aged 25 to 45) walk past the closed shop between 6 AM and 8 AM. If this number is high enough, it may make sense to open the store earlier.
As much as organizations prioritize the value of data, customers prioritize the privacy of data. If an organization loses a customer’s data, it results in a several costs to the organization. These costs include:
- Damage to the company’s reputation
- A reduction of customer trust
- Financial costs associated with the investigation of the loss
- Possible governmental fines
- Possible restitution costs
To guard against these risks, data that organizations provide to their partners must be obfuscated. This protects customer privacy. However, data that has been obfuscated is often of a lower value to the partner. For example, if the date of birth of those passing the coffee shop has been obfuscated, the store owner may not be able to determine if those passing by are potential customers. When data is obfuscated without consideration of the analysis that needs to be done, analysis results may not be correct.
There is away to provide data privacy for the customer while simultaneously monetizing enterprise data. To do so, organizations must allow trusted partners to define masking generalizations. With sufficient data masking governance, it is indeed possible for data obfuscation and data value to coexist.
Currently, there is a great deal of research around ensuring that obfuscated data is both protected and useful. Techniques and algorithms like ‘k-Anonymity’ and ‘l-Diversity’ ensure that sensitive data is safe and secure. However, these techniques have have not yet become mainstream. Once they do, the value of big data will be unlocked.