Category Archives: Database Archiving
In a previous life, I was a pastry chef in a now-defunct restaurant. One of the things I noticed while working there (and frankly while cooking at home) is that the better the ingredients, the better the final result. If we used poor quality apples in the apple tart, we ended up with a soupy, flavorless mess with a chewy crust.
The same analogy can be applied to Data Analytics. With poor quality data, you get poor results from your analytics projects. We all know that companies that can implement fantastic analytic solutions that can provide near real-time access to consumer trends are the same companies that can do successful targeted marketing campaigns that are of the minute. The Data Warehousing Institute estimates that data quality problems cost U.S. businesses more than $600 billion a year.
The business impact of poor data quality cannot be underestimated. If not identified and corrected early on, defective data can contaminate all downstream systems and information assets, jacking up costs, jeopardizing customer relationships, and causing imprecise forecasts and poor decisions.
- To help you quantify: Let’s say your company receives 2 million claims per month with 377 data elements per claim. Even at an error rate of .001, the claims data contains more than 754,000 errors per month and more than 9.04 million errors per year! If you determine that 10 percent of the data elements are critical to your business decisions and processes, you still must fix almost 1 million errors each year!
- What is your exposure to these errors? Let’s estimate the risk at $10 per error (including staff time required to fix the error downstream after a customer discovers it, the loss of customer trust and loyalty and erroneous payouts. Your company’s risk exposure to poor quality claims data is $10 million a year.
Once your company values quality data as a critical resource – it is much easier to perform high-value analytics that have an impact on your bottom line. Start with creation of a Data Quality program. Data is a critical asset in the information economy, and the quality of a company’s data is a good predictor of its future success.
Every year, I get a replacement desk calendar to help keep all of our activities straight – and for a family of four, that is no easy task. I start with taking all of the little appointment cards the dentist, orthodontist, pediatrician and GP give to us for appointments that occur beyond the current calendar dates. I transcribe them all. Then I go through last year’s calendar to transfer any information that is relevant to this year’s calendar. And finally, I put the calendar down in the basement next to previous year calendars so I can refer back to them if I need. Last year’s calendar contains a lot of useful information, but no longer has the ability to solve my need to organize schedules for this year.
In a very loose way – this is very similar to application retirement. Many larger health plans have existing systems that were created several years (sometimes even several decades) ago. These legacy systems have been customized to reflect the health plan’s very specific business processes. They may be hosted on costly hardware, developed in antiquated software languages and rely on a few developers that are very close to retirement. The cost of supporting these (most likely) antiquated systems can be diverting valuable dollars away from innovation.
The process that I use to move appointment and contact data from one calendar to the next works for me – but is relatively small in scale. Imagine if I was trying to do this for an entire organization without losing context, detail or accuracy!
There are several methodologies for determining the best strategy for your organization to approach software modernization, including:
- Architecture Driven Modernization (ADM) is the initiative to standardize views of the existing systems in order to enable common modernization activities like code analysis and comprehension, and software transformation.
- SABA (Bennett et al., 1999) is a high-level framework for planning the evolution and migration of legacy systems, taking into account both organizational and technical issues.
- SRRT (Economic Model to Software Rewriting and Replacement Times), Chan et al. (1996), Formal model for determining optimal software rewrite and replacement timings based on versatile metrics data.
- And if all else fails: Model Driven Engineering (MDE) is being investigated as an approach for reverse engineering and then forward engineering software code
My calendar migration process evolved over time, your method for software modernization should be well planned prior to the go-live date for the new software system.
Ah yes, the Old Mainframe. It just won’t go away. Which means there is still valuable data sitting in it. And that leads to a question that I have been asked about repeatedly in the past few weeks, about why an organization should use a tool like Informatica PowerExchange to extract data from a mainframe when you can also do it with a script that extracts the data as a flat file.
So below, thanks to Phil Line, Informatica’s Product Manager for Mainframe connectivity, are the top ten reasons to use PowerExchange over hand coding a flat file extraction.
1) Data will be “fresh” as of the time the data is needed – not already old based on when the extraction was run.
2) Any data extracted directly from files will be as the file held it, any additional processes needed to run in order to extract/transfer data to LUW could potentially alter the original formats.
3) The consuming application can get the data when it needs it; there wouldn’t be any scheduling issues between creating the extract file and then being able to use it.
4) There is less work to do if PowerExchange reads the data directly from the mainframe, data type processing as well as potential code page issues are all handled by PowerExchange.
5) Unlike any files created with ftp type processes, where problems could cut short the expected data transfer, PowerExchange/PowerCenter provide log messages so as to ensure that all data has been processed.
6) The consumer has the capacity only to select the data that is needed for the consumer application, use of filtering can reduce the amount of data being transferred as well as any potential security aspects.
7) Any data access of mainframe based data can be secured according to the security tools in place on the mainframe; PowerExchange is fully compliant to RACF, ACF2 & Top-Secret security products.
8) Using Informatica’s PowerExchange, along with Informatica consuming tools (PowerCenter, Mercury etc.) provides a much simpler and cleaner architecture. The simpler the architecture the easier it is to find problems as well as audit the processes that are touching the data.
9) PowerExchange generally can help avoid the normal bottlenecks associated to getting data off of the mainframe, programmers are not needed to create the extract processes, new schedules don’t need to be created to ensure that the extracts run, in the event of changes being necessary they can be controlled by the Business group consuming the data.
10) Helps control mainframe data extraction processes that are still being run but from which no one uses the generated data as the original system that requested the data has now become obsolete.
This magic quadrant focuses on what Gartner calls Structured Data Archiving. Data Archiving is used to index, migrate, preserve and protect application data in secondary databases or flat files. These are typically located on lower-cost storage, for policy-based retention. Data Archiving makes data available in context of the originating business process or application. This is especially useful in the event of litigation or of an audit.
The Magic Quadrant calls out two use cases. These use cases are “live archiving of production applications” and “application retirement of legacy systems.” Informatica refers to both use cases, together, as “Enterprise Data Archiving.” We consider this to be a foundational component of a comprehensive Information Lifecycle Management strategy.
The application landscape is constantly evolving. For this reason, data archiving is a strategic component of a data growth management strategy. Application owners need a plan to manage data as applications are upgraded, replaced, consolidated, moved to the cloud and/or retired.
When you don’t have a plan in production, data accumulates in the business application. When this happens, performance bothers the business. In addition, data bloat bothers IT operations. When you don’t have a plan for legacy systems, applications accumulate in the data center. As a result, increasing budgets bother the CFO.
A data growth management plan must include the following:
- How to cycle through applications and retire them
- How to smartly store the application data
- How to ultimately dispose data while staying compliant
Structured data archiving and application retirement technologies help automate and streamline these tasks.
Informatica Data Archive delivers unparalleled connectivity, scalability and a broad range of innovative options (i.e. Smart Partitioning, Live Archiving, and retiring aging and legacy data to the Informatica Data Vault), and comprehensive retention management and data reporting and visualization. We believe our strengths in this space are the key ingredients for deploying a successful enterprise data archive.
For more information, read the Gartner Magic Quadrant for Structured Data Archiving and Application Retirement.
Oracle DBAs are challenged with keeping mission critical databases up and running with predictable performance as data volumes grow. Our customers are changing their approach to proactively managing Oracle performance while simplifying IT by leveraging our innovative Data Archive Smart Partitioning features. Smart Partitioning leverages Oracle Database Partitioning, simplifying deploying and managing partitioning strategies. DBAs have been able to respond to requests to improve business process performance without having to write any custom code or SQL scripts.
With Smart Partitioning, DBA’s have a new dialogue with business analysts – rather than wading in the technology weeds, they ask how many months, quarters or years of data are required to get the job done? And show – within a few clicks – how users can self-select how much gets processed when they run queries, reports or programs – basically showing them how they can control their own performance by controlling the volume of data they pull from the database.
Smart Partitioning is configured using easily understood business dimensions such as time, company, business unit etc. These dimensions make it easy to ‘slice’ data to meet the job at hand. Performance becomes manageable and under business control. Another benefit is in your non-production environments. Creating smaller sized, subset databases that are fully functional now fits easily into your cloning operations.
Finally, Informatica has been working closely with the Oracle Enterprise Solutions Group to align Informatica Data Archive Smart Partitioning with the Oracle ZS3 Appliance to maximize performance and savings while minimizing the complexity of implementing an Information Lifecycle Management strategy.
When the average person hears of cloning, my bet is that they think of the controversy and ethical issues surrounding cloning, such as the cloning of Dolly the sheep, or the possible cloning of humans by a mad geneticist in a rogue nation state. I would also put money down that when an Informatica blog reader thinks of cloning they think of “The Matrix” or “Star Wars” (that dreadful episode II Attack of the Clones). I did. Unfortunately.
But my pragmatic expectation is that when Informatica customers think of cloning, they also think of Data Cloning software. Data Cloning software clones terabytes of database data into a host of other databases, data warehouses, analytical appliances, and Big Data stores such as Hadoop. And just for hoots and hollers, you should know that almost half of all Data Integration efforts involve replication, be it snapshot or real-time, according to TDWI survey data. Survey also says… replication is the second most popular — or second most used — data integration tool, behind ETL.
Do your company’s cloning tools work with non-standard types? Know that Informatica cloning tools can reproduce Oracle data to just about anything on 2 tuples (or more). We do non-discriminatory duplication, so it’s no wonder we especially fancy cloning the Oracle! (a thousand apologies for the bad “Matrix” pun)
Just remember that data clones are an important and natural component of business continuity, and the use cases span both operational and analytic applications. So if you’re not cloning your Oracle data safely and securely with the quality results that you need and deserve, it’s high time that you get some better tools.
Send in the Clones
With that in mind, if you haven’t tried to clone before, for a limited time, Informatica is making Fast Clone database cloning trial software product available for a free download. Click here to get it now.
This is the first in a series of articles where I will take an in-depth look at how state and local governments are affected by data breaches and what they should be considering as part of their compliance, risk-avoidance and remediation plans.
Each state has one or more agencies that are focused on the lives, physical and mental health and overall welfare of their citizens. The mission statement of the Department of Public Welfare of Pennsylvania, my home state is typical, it reads “Our vision is to see Pennsylvanians living safe, healthy and independent lives. Our mission is to improve the quality of life for Pennsylvania’s individuals and families. We promote opportunities for independence through services and supports while demonstrating accountability for taxpayer resources.”
Just as in the enterprise, over the last couple of decades the way an agency deals with citizens has changed dramatically. No longer is everything paper-based and manually intensive – each state has made enormous efforts not just to automate more and more of their processes but more lately to put everything online. The combination of these two factors has led to the situation where just about everything a state knows about each citizen is stored in numerous databases, data warehouses and of course accessed through the Web.
It’s interesting that in the PA mission statement two of the three focus areas are safety and health– I am sure when written these were meant in the physical sense. We now have to consider what each state is doing to safeguard and promote the digital safety and health of its citizens. You might ask what digital safety and health means – at the highest level this is quite straightforward – it means that each state must ensure the data it holds about its’ citizens is safe from inadvertent or deliberate exposure or disclosure. It seems that each week we read about another data breach – high profile data breach infographic – either accidental (a stolen laptop for instance) or deliberate (hacking as an example) losses of data about people – the citizens. Often that includes data contents that can be used to identify the individuals, and once an individual citizen is identified they are at risk of identity theft, credit card fraud or worse.
Of the 50 states, 46 now have a series of laws and regulations in place about when and how they need to report on data breaches or losses – this is all well and good, but is a bit like shutting the stable door after the horse has bolted – but with higher stakes as there are potentially dire consequences to the digital safety and health of their citizens.
In the next article I will look at the numerous areas that are often overlooked when states establish and execute their data protection and data privacy plans.
Informatica announced yesterday the Informatica ILM Nearline product is SAP-certified. ILM Nearline helps IT organizations reduce costs of managing data growth in existing implementations of the SAP NetWeaver Business Warehouse (SAP NetWeaver BW) and SAP HANA. By doing so, customers can leverage freed budgets and resources to invest in its application landscape and data center modernization initiatives. Informatica ILM Nearline v6.1A for use with SAP NetWeaver BW and SAP HANA, available today, is purpose-built for SAP environments leveraging native SAP interfaces.
Data volumes are growing the fastest in data warehouse and reporting applications, yet a significant amount of it is rarely used or infrequently accessed. In deployments of SAP NetWeaver BW, standard SAP archiving can reduce the size of a production data warehouse database to help preserve its performance, but if users ever want to query or manipulate the archived data, the data needs to be loaded back into the production system disrupting data analytics processes and extending time to insight. The same holds true for SAP HANA.
To address this, ILM Nearline enables IT to migrate large volumes of largely inactive SAP NetWeaver BW or SAP HANA data from the production database or in memory store to online, secure, highly compressed, immutable files in a near-line system while maintaining end-user access. The result is a controlled environment running SAP NetWeaver BW or SAP HANA with predictable, ongoing hardware, software and maintenance costs. This helps ensure service-level agreements (SLAs) can be met while freeing up ongoing budget and resources so IT can focus on innovation.
Informatic ILM Nearline for use with SAP NetWeaver BW and SAP HANA has been certified with the following interfaces:
- NW-BW-NLS Nearline Storage SAP NetWeaver BW 7.30 on SAP HANA for Informatica Data Archive 6.1A
- NW-BW-NLS 7.30 – Nearline Storage – SAP NetWeaver BW 7.30 for Informatica Data Archive 6.1A
- BC-HCS 6.20 – HTTP Content Server 6.20 for Interface for Informatica Data Archive 6.1
“Informatica ILM Nearline for use with SAP NetWeaver BW and SAP HANA is all about reducing the costs of data while keeping the data easily accessible and thus valuable,” said Adam Wilson, general manager, ILM, Informatica. “As data volumes continue to soar, the solution is especially game-changing for organizations implementing SAP HANA as they can use the Informatica-enabled savings to help offset and control the costs of their SAP HANA licenses without disrupting the current SAP NetWeaver BW users’ access to the data.”
Specific advantages of Informatica ILM Nearline include:
- Industry-leading compression rates – Informatica ILM Nearline’s compression rates exceed standard database compression rates by a sizable margin. Customers typically achieve rates in excess of 90 percent, and some have reported rates as high as 98 percent.
- Easy administration and data access – No database administration is required for data archived by Informatica ILM Nearline. Data is accessible from the user’s standard SAP application screen without any IT interventions and is efficiently stored to simplify backup, restore and data replication processes.
- Limitless capacity – Highly scalable, the solution is designed to store limitless amounts of data without affecting data access performance.
- Easy storage tiering – As data is stored in a highly compressed format, the nearline archive can be easily migrated from one storage location to another in support of a tiered storage strategy.
Available now, Informatica ILM Nearline for use with SAP NetWeaver BW and SAP HANA is based on intellectual property acquired from Sand Technology in Q4 2011 and enhanced by Informatica.
 Informatica Survey Results, January 23, 2013 (citation from Enterprise Data Archive for Hybrid IT Webinar)
The Oracle Application User Group (OAUG) Archive and Purge Special Interest Group (SIG) held its semi-annual session first thing in the morning, Sunday September 22, 2013 – 8:00am. The chairman of the SIG, Brian Bent, must have lost in the drawing straws contest for session times. Regardless, attendance was incredibly strong and the topic, ‘Cleaning up your Oracle E-Business Suite Mess’, was well received.
From the initial audience survey, most attendees have made the jump to OEBS R12 and very few have implemented an Information Lifecycle Management (ILM) strategy. As organizations migrate to the latest version, the rate of data growth increases significantly such that performance takes a plunge, costs for infrastructure and storage spike, and DBAs are squeezed with trying to make due.
The bulk of the discussion was on what Oracle offers for purging Concurrent Programs. The focus was on system tables – not functional archive and purge routines, like General Ledger or Accounts Receivable. That will be a topic of another SIG day.
For starters, Oracle provides Concurrent Programs to purge administrative data. Look for ‘Big Tables’ owned by APPLSYS for more candidates and search for the biggest tables / indexes. Search for ‘PURGE’ on MyOracleSupport (MOS) – do your homework to decide if the Purge programs apply to you. If you are concerned about deleting data, you can create an archive table, add an ‘on delete’ trigger to the original table, run the purge and automatically save the data in the archive table (Guess what? This is a CUSTOMIZATION).
Some areas to look at include FND_Concurrent Requests and FND_LOBS.
– Most customers purge data older than 7-30 days
– Oracle recommends keeping this table under 25,000 rows
– Consider additional Purges that delete data about concurrent requests that run frequently
– DBAs do not delete from FND_LOBS; the only way to get rid of them is for Oracle to provide a concurrent Program for the module that users used to load them up
– Can take an enormous amount of space and make exporting and importing your database take a long time
– You can also look to store FND_LOBS as secure files, but requires advanced compression licenses
– Log enhancement requests for more concurrent programs to clean up FND_LOBS
– Look to third party solutions, such as Informatica
Other suggestions include WORKFLOW, but this requires more research.
For more information, join the Oracle Application User Group and sign up for the Archive and Purge Special Interest Group.
In my last post, I discussed how our Informatica ILM Nearline allows vast amounts of detail data to be accessed at speeds that rival the performance of online systems, which in turn gives business analysts and application managers the power to assess and fine-tune important business initiatives on the basis of actual historical facts. We saw that the promise of Informatica ILM Nearline is basically to give you all the data you want, when and how you want it — without compromising the performance of existing data warehouse and business reporting systems.
Today, I want to consider what this capability means specifically for a business. What are the concrete benefits of implementing Informatica ILM Nearline? Here are a few of the most important ones.
Informatica ILM Nearline enables you to keep all your valuable data available for analysis.
Having more data accessible – more details, covering longer periods – enables a number of improvements in Business Intelligence processes:
- A clearer understanding of emerging trends in the business – what will go well in the future as well as what is now “going south”
- Better support for iterative analyses, enabling more intensive Business Performance Management (BPM)
- Better insight into customer behavior over the long term
- More precise target marketing, bringing a three- to five-fold improvement in campaign yield
Informatica ILM Nearline enables you to dramatically increase information storage and maintain service levels without increasing costs or administration requirements.
- Extremely high compression rates give the ability to store considerably more information in a given hardware configuration
- A substantially reduced data footprint means much faster data processing, enabling effective satisfaction of Service Level Agreements without extensive investments in processing power
- Minimal administration requirements bring reductions in resource costs, and ensure that valuable IT and business resources will not be diverted from important tasks just to manage and maintain the Informatica ILM Nearline implementation
- High data compression also substantially reduces the cost of maintaining a data center by reducing requirements for floor space, air conditioning and so on.
Informatica ILM Nearline simplifies and accelerates Disaster Recovery scenarios.
A reduced data footprint means more data can be moved across existing networks, making Informatica ILM Nearline an ideal infrastructure for implementing and securing an offsite backup process for massive amounts of data,
Informatica ILM Nearline keeps all detail data in an immutable form, available for delivery on request.
Having read-only detail data available on-demand enables quick response to audit requests, avoiding the possibility of costly penalties for non-compliance. Optional security packages can be used to control user access and data privacy.
Informatica ILM Nearline makes it easy to offload data from the online database before making final decisions about what is to be moved to an archiving solution.
The traditional archiving process typically involves extensive analysis of data usage patterns in order to determine what should be moved to relatively inaccessible archival storage. With an Informatica ILM Nearline solution, it’s a simple matter to move large amounts of data out of the online database — thereby improving performance and guaranteeing satisfaction of SLA’s, — while still keeping the data available for access when required. Data that is determined to be no longer used, but which still needs to be kept around to comply with data retention policies or regulations, can then be easily moved into an archiving solution.
Taken together, these benefits make a strong case for implementing an Informatica ILM Nearline solution when the data tsunami threatens to overwhelm the enterprise data warehouse. In future posts, I will be investigating each of these in more detail.