A data integration hub is a proven vehicle to provide a self service model for publishing and subscribing data to be made available to a variety of users. For those who deploy these environments for regulated and sensitive data need to think of data privacy and data governance during the design phase of the project.
In the data integration hub architecture, think about how sensitive data will be coming from different locations, from a variety of technology platforms, and certainly from systems being managed by teams with a wide range of data security skills. How can you ensure data will be protected across such a heterogeneous environment? Not to mention if data traverses across national boundaries.
Then think about testing connectivity. If data needs to be validated in a data quality rules engine, in order to truly test this connectivity, there needs to be a capability to test using valid data. However testers should not have access or visibility into the actual data itself if it is classified as sensitive or confidential.
With a hub and spoke model, the rules are difficult to enforce if data is being requested from one country and received in another. The opportunity for exposing human error and potential data leakage increases exponentially. Rather than reading about a breach in the headlines, it may make sense to look at building preventative measures or spending the time and money to do the right thing from the onset of the project.
There are technologies that exist in the market that are easy to implement that are designed to prevent this very type of exposure. This technology is called data masking which includes data obfuscation, encryption and tokenization. Informatica’s Data Privacy solution based on persistent and dynamic data masking options can be easily and quickly deployed without the need to develop code or modify the source or target application.
When developing your reference architecture for a data integration hub, incorporate sound data governance policies and build data privacy into the application upfront. Don’t wait for the headlines to include your company and someone’s personal data.
Informatica announced yesterday the Informatica ILM Nearline product is SAP-certified. ILM Nearline helps IT organizations reduce costs of managing data growth in existing implementations of the SAP NetWeaver Business Warehouse (SAP NetWeaver BW) and SAP HANA. By doing so, customers can leverage freed budgets and resources to invest in its application landscape and data center modernization initiatives. Informatica ILM Nearline v6.1A for use with SAP NetWeaver BW and SAP HANA, available today, is purpose-built for SAP environments leveraging native SAP interfaces.
Data volumes are growing the fastest in data warehouse and reporting applications, yet a significant amount of it is rarely used or infrequently accessed. In deployments of SAP NetWeaver BW, standard SAP archiving can reduce the size of a production data warehouse database to help preserve its performance, but if users ever want to query or manipulate the archived data, the data needs to be loaded back into the production system disrupting data analytics processes and extending time to insight. The same holds true for SAP HANA.
To address this, ILM Nearline enables IT to migrate large volumes of largely inactive SAP NetWeaver BW or SAP HANA data from the production database or in memory store to online, secure, highly compressed, immutable files in a near-line system while maintaining end-user access. The result is a controlled environment running SAP NetWeaver BW or SAP HANA with predictable, ongoing hardware, software and maintenance costs. This helps ensure service-level agreements (SLAs) can be met while freeing up ongoing budget and resources so IT can focus on innovation.
Informatic ILM Nearline for use with SAP NetWeaver BW and SAP HANA has been certified with the following interfaces:
- NW-BW-NLS Nearline Storage SAP NetWeaver BW 7.30 on SAP HANA for Informatica Data Archive 6.1A
- NW-BW-NLS 7.30 – Nearline Storage – SAP NetWeaver BW 7.30 for Informatica Data Archive 6.1A
- BC-HCS 6.20 – HTTP Content Server 6.20 for Interface for Informatica Data Archive 6.1
“Informatica ILM Nearline for use with SAP NetWeaver BW and SAP HANA is all about reducing the costs of data while keeping the data easily accessible and thus valuable,” said Adam Wilson, general manager, ILM, Informatica. “As data volumes continue to soar, the solution is especially game-changing for organizations implementing SAP HANA as they can use the Informatica-enabled savings to help offset and control the costs of their SAP HANA licenses without disrupting the current SAP NetWeaver BW users’ access to the data.”
Specific advantages of Informatica ILM Nearline include:
- Industry-leading compression rates – Informatica ILM Nearline’s compression rates exceed standard database compression rates by a sizable margin. Customers typically achieve rates in excess of 90 percent, and some have reported rates as high as 98 percent.
- Easy administration and data access – No database administration is required for data archived by Informatica ILM Nearline. Data is accessible from the user’s standard SAP application screen without any IT interventions and is efficiently stored to simplify backup, restore and data replication processes.
- Limitless capacity – Highly scalable, the solution is designed to store limitless amounts of data without affecting data access performance.
- Easy storage tiering – As data is stored in a highly compressed format, the nearline archive can be easily migrated from one storage location to another in support of a tiered storage strategy.
Available now, Informatica ILM Nearline for use with SAP NetWeaver BW and SAP HANA is based on intellectual property acquired from Sand Technology in Q4 2011 and enhanced by Informatica.
 Informatica Survey Results, January 23, 2013 (citation from Enterprise Data Archive for Hybrid IT Webinar)
The Oracle Application User Group (OAUG) Archive and Purge Special Interest Group (SIG) held its semi-annual session first thing in the morning, Sunday September 22, 2013 – 8:00am. The chairman of the SIG, Brian Bent, must have lost in the drawing straws contest for session times. Regardless, attendance was incredibly strong and the topic, ‘Cleaning up your Oracle E-Business Suite Mess’, was well received.
From the initial audience survey, most attendees have made the jump to OEBS R12 and very few have implemented an Information Lifecycle Management (ILM) strategy. As organizations migrate to the latest version, the rate of data growth increases significantly such that performance takes a plunge, costs for infrastructure and storage spike, and DBAs are squeezed with trying to make due.
The bulk of the discussion was on what Oracle offers for purging Concurrent Programs. The focus was on system tables – not functional archive and purge routines, like General Ledger or Accounts Receivable. That will be a topic of another SIG day.
For starters, Oracle provides Concurrent Programs to purge administrative data. Look for ‘Big Tables’ owned by APPLSYS for more candidates and search for the biggest tables / indexes. Search for ‘PURGE’ on MyOracleSupport (MOS) – do your homework to decide if the Purge programs apply to you. If you are concerned about deleting data, you can create an archive table, add an ‘on delete’ trigger to the original table, run the purge and automatically save the data in the archive table (Guess what? This is a CUSTOMIZATION).
Some areas to look at include FND_Concurrent Requests and FND_LOBS.
- Most customers purge data older than 7-30 days
- Oracle recommends keeping this table under 25,000 rows
- Consider additional Purges that delete data about concurrent requests that run frequently
- DBAs do not delete from FND_LOBS; the only way to get rid of them is for Oracle to provide a concurrent Program for the module that users used to load them up
- Can take an enormous amount of space and make exporting and importing your database take a long time
- You can also look to store FND_LOBS as secure files, but requires advanced compression licenses
- Log enhancement requests for more concurrent programs to clean up FND_LOBS
- Look to third party solutions, such as Informatica
Other suggestions include WORKFLOW, but this requires more research.
For more information, join the Oracle Application User Group and sign up for the Archive and Purge Special Interest Group.
Testing in any domain involves dealing with an incredible amount of test data. As application environments get bigger and more complex, testing teams become susceptible to significant inefficiencies and project delays. Incomplete, missing, or irrelevant test data sets can wreak havoc on any QA cycle. If production data is copied for testing purposes, testers need to ensure sensitive data is protected – adding time to test data creation tasks.
Informatica recently polled its Test Data Management customers through a third party validation service, TechValidate. Here are three cases studies and testimonials about how Informatica customers are addressing these challenges.
Many Salesforce developers that use sandbox environments for test and development suffer from the following challenges:
- Lack of relevant data for proper testing and development (empty sandboxes)
- To fix that problem, they manually copy data from production
- Which results in exposing sensitive data to unauthorized users
- And potentially consuming more storage than allocated for sandbox environments (resulting in unexpected costs)
To address these challenges, Informatica just released Cloud Test Data Management for Salesforce. This solution is designed to give Salesforce admins and developers the ability to provision secure test data subsets to developers through an easy to use, wizard driven approach. The application is delivered as a service through a subscription-based pricing model.
The Informatica IT team uses Salesforce internally and validated an ROI based on reducing the amount of developer time used to manually script copying data from production to a sandbox, reducing the amount of time fixing defects due to not having the right test data, and eliminating the risk of a data breach by masking sensitive data.
To learn more about this new offering, watch a demonstration that shows how to create secure test data subsets for Salesforce. Also, available now, try the free Cloud Data Masking app or take a 30-day Cloud Test Data Management trial.
Informatica recently hosted a webinar with Cognizant who shared how they streamline test data management processes internally with Informatica Test Data Management and pass on the benefits to their customers. Proclaimed as the world’s largest Quality Engineering and Assurance (QE&A) service provider, they have over 400 customers and thousands of testers and are considered a thought leader in the testing practice.
We polled over 100 attendees on what their top challenges were with test data management considering the data and system complexities and the need to protect their client’s sensitive data. Here are the results from that poll:
It was not surprising to see that generating test data sets and securing sensitive data in non-production environments were tied as the top two biggest challenges. Data integrity/synchronization was a very close 3rd .
Cognizant with Informatica has been evolving its test data management offering to truly focus on not only securing sensitive data – but also improving testing efficiencies with identifying, provisioning and resetting test data – tasks that consume as much as 40% of testing cycle times. As part of the next generation test data management platform, key components of that solution include:
Sensitive Data Discovery – an integrated and automated process that searches data sets looking for exposed sensitive data. Many times, sensitive data resides in test copies unbeknownst to auditors. Once data has been located, data can be masked in non-production copies.
Persistent Data Masking – masks sensitive data in-flight while cloning data from production or in-place on a gold copy. Data formats are preserved while original values are completely protected.
Data Privacy Compliance Validation – auditors want to know that data has in fact been protected, the ability to validate and report on data privacy compliance becomes critical.
Test Data Management – in addition to creating test data subsets, clients require the ability to synthetically generate test data sets to eliminate defects by having data sets aligned to optimize each test case. Also, in many cases, multiple testers work on the same environment and may clobber each other’s test data sets. Having the ability to reset test data becomes a key requirement to improve efficiencies.
Figure 2 Next Generation Test Data Management
When asked what tools or services that have been deployed, 78% said in-house developed scripts/utilities. This is an incredibly time-consuming approach and one that has limited repeatability. Data masking was deployed in almost half of the respondents.
Informatica with Cognizant are leading the way to establishing a new standard for Test Data Management by incorporating both test data generation, data masking, and the ability to refresh or reset test data sets. For more information, check out Cognizant’s offering based on Informatica: TDMaxim and White Paper: Transforming Test Data Management for Increased Business Value.
Healthcare organizations are currently engaged in major transformative initiatives. The American Recovery and Reinvestment Act of 2009 (ARRA) provided the healthcare industry incentives for the adoption and modernization of point-of-care computing solutions including electronic medical and health records (EMRs/EHRs). Funds have been allocated, and these projects are well on their way. In fact, the majority of hospitals in the US are engaged in implementing EPIC, a software platform that is essentially the ERP for healthcare.
These Cadillac systems are being deployed from scratch with very little data being ported from the old systems into the new. The result is a dearth of legacy applications running in aging hospital data centers, consuming every last penny of HIS budgets. Because the data still resides on those systems, hospital staff continues to use them making it difficult to shut down or retire.
Most of these legacy systems are not running on modern technology platforms – they run on systems such as HP Turbo Image, Intercache Mumps, and embedded proprietary databases. Finding people who know how to manage and maintain these systems is costly and risky – risky in that if data residing in those applications is subject to data retention requirements (patient records, etc.) and the data becomes inaccessible.
A different challenge for CFOs of these hospitals is the ROI on these EPIC implementations. Because these projects are multi-phased, multi-year, boards of directors are asking about the value realized from these investments. Many are coming up short because they are maintaining both applications in parallel. Relief will come when systems can be retired – but getting hospital staff and regulators to approve a retirement project requires evidence that they can still access data while adhering to compliance needs.
Many providers have overcome these hurdles by successfully implementing an application retirement strategy based on the Informatica Data Archive platform. Several of the largest pediatrics’ children’s hospitals in the US are either already saving or expecting to save $2 Million or more annually from retiring legacy applications. The savings come from:
- Eliminating software maintenance and license costs
- Eliminate hardware dependencies and costs
- Reduced storage requirements by 95% (data archived is stored in a highly compressed, accessible format)
- Improved efficiencies in IT by eliminating specialized processes or skills associated with legacy systems
- Freed IT resources – teams can spend more of their time working on innovations and new projects
Informatica Application Retirement Solutions for Healthcare provide hospitals with the ability to completely retire legacy applications, retire and maintain access to archive data for hospital staff. And with built in security and retention management, records managers and legal teams are satisfying compliance requirements. Contact your Informatica Healthcare team for more information on how you can get that EPIC ROI the board of directors is asking for.
In recent conversations regarding solutions to implement for data privacy, our Dynamic Data Masking team put together the following table to highlight the differences between encryption / tokenization and Dynamic Data Masking (DDM). Best practices dictate that both should be implemented in an enterprise for the most comprehensive and complete data security strategy. For the purpose of this blog, here are a few definitions:
Dynamic Data Masking (DDM) protects sensitive data when it is retrieved based on policy without requiring the data to be altered when it is stored persistently. Authorized users will see true data, unauthorized users will see masked values in the application. No coding is required in the source application.
Encryption / tokenization protects sensitive data by altering its values when stored persistently while being able to decrypt and present the original values when requested by authorized users. The user is validated by a separate service which then provides a decryption key. Unauthorized users will only see the encrypted values. In many cases, applications need to be altered requiring development work.
|Business users access PII||Business users work with actual SSN and personal values in the clear (not with tokenized values). As the data is tokenized in the database, it needs to be de-tokenized every time it is accessed by users – which is done be changing the application source-code (imposing costs and risks), and causing performance penalty.For example, if a user needs to retrieve information on a client with SSN = ‘987-65-4329’, the application needs to de-tokenize the entire tokenized SSN column to identify the correct client info – a costly operation. This is why implementation scope is limited.||As DDM does not change the data in the database, but only masks it when accessed by unauthorized users, authorized users do not experience any performance hit nor require application source-code changes.For example, if an authorized user needs to retrieve information on a client with SSN = ‘987-65-4329’, his request is untouched by DDM. As the SSN stored in the database is not changed, there is no performance penalty involved.In case an unauthorized user retrieves the same SSN, DDM masks the SQL request, causing the sensitive data result (e.g., name, address, CC and age) to be masked, hidden or completely blocked.|
|Privileged Infrastructure DBA have access to the database server files||Personal Identifiable Information (PII) stored in the database files is tokenized, ensuring that the few administrators that have uncontrolled access to the database servers cannot see it||PII stored in the database files remains in the clear. The few administrators that have uncontrolled access to the database servers can potentially access it.|
|Production support, application developers, DBAs, consultants, outsource and offshore teams||These groups of users have application super-user privileges, seen by the tokenization solution as authorized, and as such access PII in the clear!!!||These users are identified by DDM as unauthorized, and as such are masked, hidden or blocked, protecting the PII.|
|Data warehouse protection||Implementing tokenization on Data warehouses requires tedious database changes and causes performance penalty:1.Loading or reporting upon millions of PII records requires to tokenize/de-tokenize each record.2.Running a report with a condition on a tokenized value (e.g., when having a condition: SSN like (‘%333’) causes the de-tokenization of the entire column).
Massive database configuration changes are required to use the tokenization API, creating and maintaining hundreds of views.
|No performance penalty.No need to change reports, databases or to create views.|
Combining both DDM and encryption/tokenization presents an opportunity to deliver complete data privacy without the need to alter the application or write any code.
Informatica works with its encryption and tokenization partners to deliver comprehensive data privacy protection in packaged applications, data warehouses and Big Data platforms such as Hadoop.
The OAUG hosted its annual convention, Collaborate13, this week in Denver, Colorado. The week started out with beautiful spring weather and turned quickly into frigid temperatures with a snow flurry bonus. The rapid change in weather didn’t stop 4,000 attendees from elevating their application knowledge in the mile high city. One topic that was very well attended from our perspective was the evolution of database archiving. (more…)
Businesses retain information in an Enterprise data archiving either for compliance – adhere to data retention regulations – or because business users are afraid to let go of data they are used to having access to. Many IT have told us they retain data in archives because they are looking to cut infrastructure costs and do not have retention requirements clearly articulated from the business. As a result, enterprise data archiving has morphed into serving multiple purposes for IT –they can eliminate costs associated with maintaining aging data in production applications, allow business users to access the information on demand, all while adhering to some – if any known or defined – retention policies. (more…)