Tag Archives: data monitoring
Data warehouses are applications– so why not manage them like one? In fact, data grows at a much faster rate in data warehouses, since they integrate date from multiple applications and cater to many different groups of users who need different types of analysis. Data warehouses also keep historical data for a long time, so data grows exponentially in these systems. The infrastructure costs in data warehouses also escalate quickly since analytical processing on large amounts of data requires big beefy boxes. Not to mention the software license and maintenance costs of such a large amount of data. Imagine how many backup media is required to backup tens to hundreds of terabytes of data warehouses on a regular basis. But do you really need to keep all that historical data in production?
One of the challenges of managing data growth in data warehouses is that it’s hard to determine which data is actually used, which data is no longer being used, or even if the data was ever used at all. Unlike transactional systems where the application logic determines when records are no longer being transacted upon, the usage of analytical data in data warehouses has no definite business rules. Age or seasonality may determine data usage in data warehouses, but business users are usually loath to let go of the availability of all that data at their fingertips. The only clear cut way to prove that some data is no longer being used in data warehouses is to monitor its usage.
I recently had the opportunity to meet with the board of directors for a large distribution company here in the U.S. On the table for discussion were data quality and data governance, and how a focus on both could help the organization gain competitive advantage in the market. While I was happy to see that this company had tied data quality and data governance to help meet their corporate objectives, that’s not what caught my attention. Instead, what impressed me the most was how the data quality and data governance champion had effectively helped the rest of the board see that there WAS a direct link, and that with careful focus they could drive better business outcomes than they could without a focus on data at all. As it turns out, the path to success for the champion was to focus on articulating the link between trusted data — governed effectively — and the company’s ability to excel financially, manage costs, limit its risk exposure and maintain trust with its customers. (more…)