Data warehouses tend to grow very quickly because they integrate data from multiple sources and maintain years of historical data for analytics. A number of our customers have data warehouses in the hundreds of terabytes to petabytes range. Managing such a large amount of data becomes a challenge. How do you curb runaway costs in such an environment? Completing maintenance tasks within the prescribed window and ensuring acceptable performance are also big challenges.
We have provided best practices to archive aged data from data warehouses. Archiving data will keep the production data size at almost a constant level, reducing infrastructure and maintenance costs, while keeping performance up. At the same time, you can still access the archived data directly if you really need to from any reporting tool. Yet many are loath to move data out of their production system. This year, at Informatica World, we’re going to discuss another method of managing data growth without moving data out of the production data warehouse. I’m not going to tell you what this new method is, yet. You’ll have to come and learn more about it at my breakout session at Informatica World: What’s New from Informatica to Improve Data Warehouse Performance and Lower Costs.
I look forward to seeing all of you at Aria, Las Vegas next month. Also, I am especially excited to see our ILM customers at our second Product Advisory Council again this year.
I’ve been approached by a number of customers who are looking to archive data from their Salesforce application. There are two primary drivers I have heard cited:
- The need to manage the retention of Salesforce data and easily find and access it for legal eDiscovory
- Storage cost reduction for data that’s no longer active
Just like your on-premise database applications like E-Business Suite, PeopleSoft, Siebel and custom applications, SaaS applications such as Salesforce, Oracle CRM On Demand, Microsoft Dynamics, NetSuite, Eloqua and others will experience large data growth causing performance issues and increasing costs.
As data grows in your SaaS applications, the performance of accessing transactions and reporting will degrade. Your SaaS vendors will also require more time, effort, and cost to maintain and manage this data. Backups, upgrades and replication of these environments will take longer and application availability will be impacted due to longer maintenance windows. Your SaaS application vendors will require more storage to house the additional data and this cost will be passed on to you. (more…)
SAP’s data warehouse solution (SAP BW) provides enterprises the ability to easily build a warehouse over their existing operational systems with pre-defined extraction and reporting objects and methods. Data that is loaded into SAP BW is stored in a layered architecture which encourages reusability of data throughout the system in a standardized way. SAP’s implementation also enables easy audits of data delivery mechanisms that are used to produce various reports within the system.
To allow enterprises to achieve this level of standardization and auditability, SAP BW must persistently store large amounts of data within different layers of their architecture. Managing the size of the objects within these layers will become increasingly important as the system grows to insure high levels of performance for end-user queries and data delivery. (more…)
Both partitioning and archiving are alternative methods of improving database and application performance. Depending on a database administrator’s comfort level for one technology or method over another, either partitioning or archiving could be implemented to address performance issues due to data growth in production applications. But what are the best practices for utilizing one or the other method and how can they be used better together?
Just like your house needs yearly spring cleaning and you need to regularly throw out old junk, your application portfolio needs periodic review and rationalization to identify legacy, redundant applications that can be decommissioned to reduce bloat and save costs. If you have a hard time letting go of old stuff, it’s probably even harder for your application users to let go of access to their data. However, retiring applications doesn’t have to mean that you also lose the data within them. If the data within those applications are still needed for periodic reporting or for regulatory compliance, then there are still ways to retain the data without maintaining the application. (more…)
Data warehouses are applications– so why not manage them like one? In fact, data grows at a much faster rate in data warehouses, since they integrate date from multiple applications and cater to many different groups of users who need different types of analysis. Data warehouses also keep historical data for a long time, so data grows exponentially in these systems. The infrastructure costs in data warehouses also escalate quickly since analytical processing on large amounts of data requires big beefy boxes. Not to mention the software license and maintenance costs of such a large amount of data. Imagine how many backup media is required to backup tens to hundreds of terabytes of data warehouses on a regular basis. But do you really need to keep all that historical data in production?
One of the challenges of managing data growth in data warehouses is that it’s hard to determine which data is actually used, which data is no longer being used, or even if the data was ever used at all. Unlike transactional systems where the application logic determines when records are no longer being transacted upon, the usage of analytical data in data warehouses has no definite business rules. Age or seasonality may determine data usage in data warehouses, but business users are usually loath to let go of the availability of all that data at their fingertips. The only clear cut way to prove that some data is no longer being used in data warehouses is to monitor its usage.
A recent IDC report says that we should expect enterprise spending on private cloud storage will grow by 28.9% between now and 2015. At the same time, the report also indicates that big data and archival are among the key drivers of this growth. It is not surprising that organizations are seeking new cost savings alternatives. Private and public cloud deployments are just the next extension of outsourcing. By definition, archived data are inactive data that’s infrequently or rarely accessed, and perhaps needed only for compliance audits or eDiscovery. It is therefore ideal for storage in the cloud. Cloud storage also accommodates for the on demand, elastic growth of aged, archived, and retired data. (more…)
In a recent InformationWeek blog, “Big Data A Big Backup Challenge”, George Crump aptly pointed out the problems of backing up big data and outlined some best practices that should be applied to address them, including:
- Identifying which data can be re-derived and therefore doesn’t need to be backed up
- Eliminating redundancy, file de-duplication, and applying data compression
- Using storage tiering and the combination of online disk and tapes to reduce storage cost and optimize performance (more…)
As part of their cost cutting program, organizations are consolidating data centers and the applications within them. Federal and state agencies in the public sector are among those where IT consolidation and moving applications to the cloud are top priorities as part of an overall goal to increase efficiencies and eliminate costs. In other industries, many consolidations are also under way due to mergers and acquisitions and other cost cutting initiatives. As you plan or undergo a consolidation project, you also need to plan for the retirement of legacy, redundant applications that are left behind.