Category Archives: Application ILM
This magic quadrant focuses on what Gartner calls Structured Data Archiving. Data Archiving is used to index, migrate, preserve and protect application data in secondary databases or flat files. These are typically located on lower-cost storage, for policy-based retention. Data Archiving makes data available in context of the originating business process or application. This is especially useful in the event of litigation or of an audit.
The Magic Quadrant calls out two use cases. These use cases are “live archiving of production applications” and “application retirement of legacy systems.” Informatica refers to both use cases, together, as “Enterprise Data Archiving.” We consider this to be a foundational component of a comprehensive Information Lifecycle Management strategy.
The application landscape is constantly evolving. For this reason, data archiving is a strategic component of a data growth management strategy. Application owners need a plan to manage data as applications are upgraded, replaced, consolidated, moved to the cloud and/or retired.
When you don’t have a plan in production, data accumulates in the business application. When this happens, performance bothers the business. In addition, data bloat bothers IT operations. When you don’t have a plan for legacy systems, applications accumulate in the data center. As a result, increasing budgets bother the CFO.
A data growth management plan must include the following:
- How to cycle through applications and retire them
- How to smartly store the application data
- How to ultimately dispose data while staying compliant
Structured data archiving and application retirement technologies help automate and streamline these tasks.
Informatica Data Archive delivers unparalleled connectivity, scalability and a broad range of innovative options (i.e. Smart Partitioning, Live Archiving, and retiring aging and legacy data to the Informatica Data Vault), and comprehensive retention management and data reporting and visualization. We believe our strengths in this space are the key ingredients for deploying a successful enterprise data archive.
For more information, read the Gartner Magic Quadrant for Structured Data Archiving and Application Retirement.
At the Informatica World 2014 pre-conference, the “ILM Day” sessions were packed, with over 100 people in attendance. This attendance reflects the strong interest in data archive, test data management and data security. Customers were the focus of the panel sessions today, taking center stage to share their experiences, best practices and lessons learned from successful deployments.
Both the test management and data archive panels had strong audience interest and interaction. For Test Data Management, the panel topic was “Agile Development by Streamlining Test Data Management”; for data archive, the session tackled “Managing Data Growth in the Era of Application Consolidation and Modernization”. The panels provided practical tactics and strategies to address the challenges and issues in managing data growth, and how to efficiently and safely provision test data. Thank you to the customers, partners and analysts who served on the panels; participating was EMC, Visteon, Comcast, Lowes, Tata Consultancy Services and Neuralytix.
The day concluded with a most excellent presentation from the ILM General Manager, Amit Walia and the CTO of the International Association of Privacy Professionals, Jeff Northrop. Amit provided an executive summary pre-view of Tuesday’s Secure@Source(TM) announcement, while Jeff Northrop provided a thought provoking market backdrop on the issues and challenges for data privacy and security, and how the focus on information security needs to shift to a ‘data-centric’ approach.
A very successful event for all involved!
Oracle DBAs are challenged with keeping mission critical databases up and running with predictable performance as data volumes grow. Our customers are changing their approach to proactively managing Oracle performance while simplifying IT by leveraging our innovative Data Archive Smart Partitioning features. Smart Partitioning leverages Oracle Database Partitioning, simplifying deploying and managing partitioning strategies. DBAs have been able to respond to requests to improve business process performance without having to write any custom code or SQL scripts.
With Smart Partitioning, DBA’s have a new dialogue with business analysts – rather than wading in the technology weeds, they ask how many months, quarters or years of data are required to get the job done? And show – within a few clicks – how users can self-select how much gets processed when they run queries, reports or programs – basically showing them how they can control their own performance by controlling the volume of data they pull from the database.
Smart Partitioning is configured using easily understood business dimensions such as time, company, business unit etc. These dimensions make it easy to ‘slice’ data to meet the job at hand. Performance becomes manageable and under business control. Another benefit is in your non-production environments. Creating smaller sized, subset databases that are fully functional now fits easily into your cloning operations.
Finally, Informatica has been working closely with the Oracle Enterprise Solutions Group to align Informatica Data Archive Smart Partitioning with the Oracle ZS3 Appliance to maximize performance and savings while minimizing the complexity of implementing an Information Lifecycle Management strategy.
What springs to mind when you think about old applications? What happens to them when they outlived their usefulness? Do they finally get to retire and have their day in the sun, or do they tenaciously hang on to life?
Think for a moment about your situation and of those around you. From the time work started you have been encouraged and sometimes forced to think about, plan for and fund your own retirement. Now consider the portfolio your organization has built up over the years; hundreds or maybe thousands of apps, spread across numerous platforms and locations – A mix of home-grown with the best-in-breed tools or acquired from the leading application vendors.
Evaluating Your Current Situation
- Do you know how many of those “legacy” systems are still running?
- Do you know how much these apps are costing?
- Is there a plan to retire them?
- How is the execution tracking to plan?
Truth is, even if you have a plan, it probably isn’t going well.
Providing better citizen service at a lower cost
This is something every state and local organization aspires to do by reducing costs. Many organizations are spending 75% or more of their budgets on just keeping the lights on – maintaining existing applications and infrastructure. Being able to fully retire some, or many of these applications saves significant money. Do you know how much these applications are costing your organization? Don’t forget to include the whole range of costs that applications incur – including the physical infrastructure costs such as mainframes, networks and storage, as well as the required software licenses and of course the time of the people that actually keep them running. What happens when those with with Cobol and CICS experience retire? Usually the answer is not good news. There is a lot to consider and many benefits to be gained through an effective application retirement strategy.
August 2011 report by ESG Global shows that some 68% of organizations had over six or more legacy applications running and that 50% planned to retire at least one of those over the following 12-18 months. It would be interesting to see today’s situation and be able evaluate how successful these application retirement plans have been.
A common problem is knowing where to start. You know there are applications that you should be able to retire, but planning, building and executing an effective and success plan can be tough. To help this process we have developed a strategy, framework and solution for effective and efficient application retirement. This is a good starting point on your application retirement journey.
To get a speedy overview, take six minutes to watch this video on application retirement.
We have created a community specifically for application managers in our ‘Potential At Work’ site. If you haven’t already signed up, take a moment and join this group of like-minded individuals from across the globe.
Are you interested in Oracle Data Migration Best Practices? Are you upgrading, consolidating or migrating to or from an Oracle application? Moving to the cloud or a hosted service? Research and experience confirms that the tasks associated with migrating application data during these initiatives have the biggest impact on whether the project is considered a failure or success. So how do your peers ensure data migration success?
Informatica will be offering a full day Oracle Migrations Best Practices workshop at Oracle Application User Group’s annual conference, Collaborate 14, this year on April 7th in Las Vegas, NV. During this workshop, peers and experts will share best practices for how to avoid the pitfalls and ensure successful projects, lowering migration cost and risk. Our full packed agenda includes:
- Free use and trials of data migration tools and software
- Full training sessions on how to integrate cloud-based applications
- How to provision test data using different data masking techniques
- How to ensure consistent application performance during and after a migration
- A review of Oracle Migration Best Practices and case studies
Case Study: EMC
One of the key case studies that will be highlighted is EMC’s Oracle migration journey. EMC Corporation migrated to Oracle E-Business Suite, acquired more than 40 companies in 4 years, consolidated and retired environments, and is now on its path to migrating to SAP. Not only did they migrate applications, but they also migrated their entire technology platform from physical to virtual on their journey to the cloud. They needed to control the impact of data growth along the way, manage the size of their test environments while reducing the risk of exposing sensitive data to unauthorized users during development cycles. With best practices, and the help from Informatica, they estimate that they have saved approximately $45M in IT cost savings throughout their migrations. Now that they are deploying a new analytics platform based on Hadoop. They are leveraging existing skill sets and Informatica tools to ensure data is loaded into Hadoop without missing a beat.
Case Study: Verizon
Verizon is the second case study we will be discussing. They recently migrated to Salesforce.com and needed to ensure that more than 100 data objects were integrated with on-premises, back end applications. In addition, they needed to ensure that data was synchronized and kept secure in non-production environments in the cloud. They were able to leverage a cloud-based integration solution from Informatica to simplify their complex IT application architecture and maintain data availability and security – all while migrating a major business application to the cloud.
Case Study: OEM Heavy Equipment Manufacturer
The third case study we will review involves a well-known heavy equipment manufacturer who was facing a couple of challenges – the first was a need to separate data in in an Oracle E-Business Suite application as a result of a divestiture. Secondly, they also needed to control the impact of data growth on their production application environments that were going through various upgrades. Using an innovative approach based on Smart Partitioning, this enterprise estimates it will save $23M over a 5 year period while achieving 40% performance improvements across the board.
To learn more about what Informatica will be sharing at Collaborate 14, watch this video. If you are planning to attend Collaborate 14 this year and you are interested in joining us, you can register for the Oracle Migrations Best Practices Workshop here.
In the first two issues I spent time looking at the need for states to pay attention to the digital health and safety of their citizens, followed by the oft forgotten need to understand and protect the non-production data. This is data than has often proliferated and also ignored or forgotten about.
In many ways, non-production data is simpler to protect. Development and test systems can usually work effectively with realistic but not real PII data and realistic but not real volumes of data. On the other hand, production systems need the real production data complete with the wealth of information that enables individuals to be identified – and therefore presents a huge risk. If and when that data is compromised either deliberately or accidentally the consequences can be enormous; in the impact on the individual citizens and also the cost of remediation on the state. Many will remember the massive South Carolina data breach of late 2012 when over the course of 2 days a 74 GB database was downloaded and stolen, around 3.8 million payers and 1.9 million dependents had their social security information stolen and 3.3 million “lost” bank account details. The citizens’ pain didn’t end there, as the company South Carolina picked to help its citizens seems to have tried to exploit the situation.
The biggest problem with securing production data is that there are numerous legitimate users and uses of that data, and most often just a small number of potentially malicious or accidental attempts of inappropriate or dangerous access. So the question is… how does a state agency protect its citizens’ sensitive data while at the same time ensuring that legitimate uses and users continues – without performance impacts or any disruption of access? Obviously each state needs to make its own determination as to what approach works best for them.
This video does a good job at explaining the scope of the overall data privacy/security problems and also reviews a number of successful approaches to protecting sensitive data in both production and non-production environments. What you’ll find is that database encryption is just the start and is fine if the database is “stolen” (unless of course the key is stolen along with the data! Encryption locks the data away in the same way that a safe protects physical assets – but the same problem exists. If the key is stolen with the safe then all bets are off. Legitimate users are usually easily able deliberately breach and steal the sensitive contents, and it’s these latter occasions we need to understand and protect against. Given that the majority of data breaches are “inside jobs” we need to ensure that authorized users (end-users, DBAs, system administrators and so on) that have legitimate access only have access to the data they absolutely need, no more and no less.
So we have reached the end of the first series. In the first blog we looked at the need for states to place the same emphasis on the digital health and welfare of their citizens as they do on their physical and mental health. In the second we looked at the oft-forgotten area of non-production (development, testing, QA etc.) data. In this third and final piece we looked at the need to and some options for providing the complete protection of non-production data.
In my first article on the topic of citizens’ digital health and safety we looked at the states’ desire to keep their citizens healthy and safe and also at the various laws and regulations they have in place around data breaches and losses. The size and scale of the problem together with some ideas for effective risk mitigation are in this whitepaper.
Let’s now start delving a little deeper into the situation states are faced with. It’s pretty obvious that citizen data that enables an individual to be identified (PII) needs to be protected. We immediately think of the production data: data that is used in integrated eligibility systems; in health insurance exchanges; in data warehouses and so on. In some ways the production data is the least of our problems; our research shows that the average state has around 10 to 12 full copies of data for non-production (development, test, user acceptance and so on) purposes. This data tends to be much more vulnerable because it is widespread and used by a wide variety of people – often subcontractors or outsourcers, and often the content of the data is not well understood.
Obviously production systems need access to real production data (I’ll cover how best to protect that in the next issue), on the other hand non-production systems of every sort do not. Non-production systems most often need realistic, but not real data and realistic, but not real data volumes (except maybe for the performance/stress/throughput testing system). What need to be done? Well to start with, a three point risk remediation plan would be a good place to start.
1. Understand the non-production data using sophisticated data and schema profiling combined with NLP (Natural Language Processing) techniques help to identify previously unrealized PII that needs protecting.
2. Permanently mask the PII so that it is no longer the real data but is realistic enough for non-production uses and make sure that the same masking is applied to the attribute values wherever they appear in multiple tables/files.
3. Subset the data to reduce data volumes, this limits the size of the risk and also has positive effects on performance, run-times, backups etc.
Gartner has just published their 2013 magic quadrant for data masking this covers both what they call static (i.e. permanent or persistent masking) and dynamic (more on this in the next issue) masking. As usual the MQ gives a good overview of the issues behind the technology as well as a review of the position, strengths and weaknesses of the leading vendors.
It is (or at least should be) an imperative that from the top down state governments realize the importance and vulnerability of their citizens data and put in place a non-partisan plan to prevent any future breaches. As the reader might imagine, for any such plan to success needs a combination of cultural and organizational change (getting people to care) and putting the right technology – together these will greatly reduce the risk. In the next and final issue on this topic we will look at the vulnerabilities of production data, and what can be done to dramatically increase its privacy and security.
Informatica announced, once again, that it is listed as a leader in the industry’s second Gartner Magic Quadrant for Data Masking Technology. With data security continuing to grow as one of the fastest segments in the enterprise software market, technologies such as data masking are becoming the solution of choice for data-centric security.
Increased fear of cyber-attacks and internal data breaches has made predictions that 2014 is the year of preventative and tactical measures to ensure corporate data assets are safe. Data masking should be included in those measures. According to Gartner,
“Security program managers need to take a strategic approach with tactical best-practice technology configurations in order to properly address the most common advanced targeted attack scenarios to increase both detection and prevention capabilities.”
Without these measures, the cost of an attack or breach is growing every year. The Ponemon Institute posted in a recent study:
“The 2013 Cost of Cyber Crime Study states that the average annualized cost of cybercrime incurred by a benchmark sample of US organizations was $11.56 million, nearly 78% more than the cost estimated in the first analysis conducted 4 years ago.”
Informatica believes that the best preventative measures include a layered approach for data security but without sacrificing agility or adding unnecessary costs. Data Masking delivers data-centric security with improved productivity and reduced overall costs.
Data Masking prevents internal data theft and abuse of sensitive data by hiding it from users. Data masking techniques include replacing some fields with similar-looking characters, masking characters (for example, “x”), substituting real last names with fictional last names and shuffling data within columns – to name a few. Other terms for data masking include data obfuscation, sanitization, scrambling, de-identification, and anonymization . Call it what you like, but without it – organizations may continue to expose sensitive data to those with mal intentions.
To learn more, Download the Gartner Magic Quadrant Data Masking Report now. And visit the Informatica website for data masking product information.
About the Magic Quadrant
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose
This is the first in a series of articles where I will take an in-depth look at how state and local governments are affected by data breaches and what they should be considering as part of their compliance, risk-avoidance and remediation plans.
Each state has one or more agencies that are focused on the lives, physical and mental health and overall welfare of their citizens. The mission statement of the Department of Public Welfare of Pennsylvania, my home state is typical, it reads “Our vision is to see Pennsylvanians living safe, healthy and independent lives. Our mission is to improve the quality of life for Pennsylvania’s individuals and families. We promote opportunities for independence through services and supports while demonstrating accountability for taxpayer resources.”
Just as in the enterprise, over the last couple of decades the way an agency deals with citizens has changed dramatically. No longer is everything paper-based and manually intensive – each state has made enormous efforts not just to automate more and more of their processes but more lately to put everything online. The combination of these two factors has led to the situation where just about everything a state knows about each citizen is stored in numerous databases, data warehouses and of course accessed through the Web.
It’s interesting that in the PA mission statement two of the three focus areas are safety and health– I am sure when written these were meant in the physical sense. We now have to consider what each state is doing to safeguard and promote the digital safety and health of its citizens. You might ask what digital safety and health means – at the highest level this is quite straightforward – it means that each state must ensure the data it holds about its’ citizens is safe from inadvertent or deliberate exposure or disclosure. It seems that each week we read about another data breach – high profile data breach infographic – either accidental (a stolen laptop for instance) or deliberate (hacking as an example) losses of data about people – the citizens. Often that includes data contents that can be used to identify the individuals, and once an individual citizen is identified they are at risk of identity theft, credit card fraud or worse.
Of the 50 states, 46 now have a series of laws and regulations in place about when and how they need to report on data breaches or losses – this is all well and good, but is a bit like shutting the stable door after the horse has bolted – but with higher stakes as there are potentially dire consequences to the digital safety and health of their citizens.
In the next article I will look at the numerous areas that are often overlooked when states establish and execute their data protection and data privacy plans.
Certainly, it is easy to see how it would be preferable to manage a database that is 5 TB rather than 40 TB in size, particularly when it comes to critical tasks like backup and recovery, disaster recovery, off-site backups and historical analytics. Today, however, I want to focus on another benefit of Informatica Data Vault that is less obvious but still very important: data modeling flexibility for data warehouses and data marts.Informatica Data Vault permits organizations to keep a much greater amount of useful data accessible, without requiring compromises on SLAs, TCO and reporting performance. This in turn makes a variety of flexible data modeling options available.
The Physical Table Partitioning Model
The first of these new data modeling options is based on physical table partitioning. The largest tables in a data warehouse or data mart can be physically divided between an online component and the archive counterpart. This allows the existing data model to be maintained, while introducing a “right-sizing” concept where only the regularly accessed data is kept online, and all data that doesn’t require such an expensive and/or hard to manage environment is put into the Informatica Data Vault solution. A typical rule of thumb for defining partition boundaries for data warehouses is based on the 90-day aging principle, so that any static data older than 90 days is migrated from the online warehouse to the Informatica Data Vault repository.
Now, many forms of enterprise data, such as CDR, POS, Web, Proxy or Log data, are static by definition, and are furthermore usually the main sources of data warehouse growth. This is very good news, because it means that as soon as the data is captured, it can be moved to the Informatica Data Vault store (in fact, it is conceivable that this kind of data could be fed directly to Informatica Data Vault from the source system – but that is a topic for another post). Because of the large volumes involved, this kind of detail data has usually been aggregated at one or more levels in the enterprise data warehouse. Users generally query the summary table in order to identify trends, only drilling down into the details for a specific range of records when specific needs or opportunities are identified. This data access technique is well known, and has been in use for quite some time.
The Online Summary Table Model
This leads me to the second novel design option offered by Informatica Data Vault : the ability to store all static detail data in the archive store, and then use this as the basis for building online summary tables, with the ability to quickly drill to detail in the Informatica Data Vault when required. More specifically, the Informatica Data Vault can be used to feed the online system’s summary tables directly because the data structures and SQL access remain intact. The advantage of this implementation is that it substantially reduces the size of the online database, optimizes its performance, and permits trend analysis on even very long periods. This is particularly useful when looking for emerging trends (positive or negative) related to specific products or offerings, because it gives managers the chance to analyze and respond to issues and opportunities within a realistic time frame.
Some organizations are already building this type of data hierarchy, using Data Marts or analytic cubes fed by the main Data Warehouse. I call this kind of architecture “data pipelining”. Informatica Data Vault can play an important role in such an implementation, since its repository can be shared between all the analytic platforms. This not only reduces data duplication, management/operational overhead, and requirements for additional hardware and software, it also relieves pressure on batch windows and lowers the risk of data being out of synch. Furthermore, this implementation can assist organizations with data governance and Master Data Management while improving overall data quality.
The Just-In-Case Data Model
Another important data modeling option offered by Informatica Data Vault relates to what we can call “just-in-case” data. In many cases, certain kinds of data will also be maintained outside the warehouse just in case an analyst requires ad hoc access to it for a specific study. Sometimes, for convenience, this “exceptional” data is stored in the data warehouse. However, keeping this data in an expensive storage and software environment, or even storing it on tape or inexpensive disks as independent files, can create a data management nightmare. At the same time, studies demonstrate that a very large portion of the costs associated with ad hoc analysis are concentrated in the data preparation phase. As part of this phase, the analyst needs to “shop” for the just-in-case data to be analyzed, meaning that he or she needs to find, “slice”, clean, transform and use it to build a temporary analytic platform, sometimes known as an Exploration Warehouse or “Exploration Mart.
Informatica Data Vault can play a very important role in such a scenario. Just-in-case data can be stored in the archive store, and analysts can then query it directly using standard SQL-based front-end tools to extract, slice and prepare the data for analytic use. Since much less time is spent on data preparation, far more time is available for data analysis — and there is no impact on the performance of the main reporting system. This acceleration of the data preparation phase results from the availability of a central catalog describing all the available data. The archive repository can be used to directly feed the expert’s preferred analytic platform, generally resulting in a substantial improvement in analyst productivity. Analysts can focus on executing their analyses, and on bringing more value to the enterprise, rather than on struggling to get access to clean and reliable data.