Tag Archives: manage data growth
Data warehouses are applications– so why not manage them like one? In fact, data grows at a much faster rate in data warehouses, since they integrate date from multiple applications and cater to many different groups of users who need different types of analysis. Data warehouses also keep historical data for a long time, so data grows exponentially in these systems. The infrastructure costs in data warehouses also escalate quickly since analytical processing on large amounts of data requires big beefy boxes. Not to mention the software license and maintenance costs of such a large amount of data. Imagine how many backup media is required to backup tens to hundreds of terabytes of data warehouses on a regular basis. But do you really need to keep all that historical data in production?
One of the challenges of managing data growth in data warehouses is that it’s hard to determine which data is actually used, which data is no longer being used, or even if the data was ever used at all. Unlike transactional systems where the application logic determines when records are no longer being transacted upon, the usage of analytical data in data warehouses has no definite business rules. Age or seasonality may determine data usage in data warehouses, but business users are usually loath to let go of the availability of all that data at their fingertips. The only clear cut way to prove that some data is no longer being used in data warehouses is to monitor its usage.
Database partitioning and database archiving are both methods for improving application performance. Many IT organizations use one or the other, but using them together can provide additional incremental value to an organization.
Database partitioning is a well-known method to DBAs and is supported by most of the commercially available databases. The benefits of partitioning include: (more…)
Many people ask me what additional benefits, can archiving provide when you already partition your database. The answer is two-fold:
- Archiving allows you to eliminate more data volumes from being processed to further improve response time.
- Partitioning doesn’t necessarily reduce your backup window. The only way to shorten your backup window is to remove data from your production database by archiving (or purging) it to another location.
Let me expand on these points. (more…)