What is Data Temperature and How Does it Relate to Storage Tiers?

In my previous blog I briefly mentioned the term “data temperature.” But what exactly does this term mean? Picture yourself logging to your bank website to look for a transaction in your checking account. Very frequently you want to look for pending transactions and debits and credits that happened in the last 10 days. Frequently you need to look further, maybe one month statement, to search for a check that you don’t remember was for what. Maybe once in a quarter, you need to get information about a debit that happened three months ago, about a subscription of a new magazine that is not coming to your mailbox. And of course, once a year you check yearly statements for your tax return. Give or take a few other scenarios, I am pretty sure I covered most of your use cases, right?

From that small story we can define what data temperature is: terms like “very frequently” accessed data means very hot data. “Frequently” accessed data is hot data. “Once in a quarter” we call warm data and once a year or once in five years mean cold data.

And this story also applies to companies and their data. Most recent data is usually the very hot data, and as data ages, it becomes less and less used, lowering the data temperature.

Now, let’s apply this concept to storage tiers.

We have seen many customers with a monolithic database in their OLAP and OLTP systems. All available data resides in a single database, stored in the most performing and most expensive storage tier. It is fair to say that customers are treating cold data the same way as very hot data. And what is the problem with that?

Simple answer: performance.

It is proven that you can keep adding high-end storage to your landscape, but performance will degrade. There is a time when performance will only increase if you spend your “storage dollars” decreasing the size of the main tier. There is no such thing in the performance world as adding more storage to the latest, fastest (and more expensive) tier, forever, will make your end users happy with response time.

The solution: tiering the data based on data temperature.

To make this blog more tangible, I will use SAP BW as an example. If the RDBMS of a given SAP BW has 80 GB of data and it sits in a monolithic database, you are probably having performance issues, difficulties to perform backups and copies of the environment to non-production systems. The best approach here is to define tiers of storage, using the best technologies for very hot data (like BWA and HANA), nearline for warm to cold data and archive for cold and historic data.

If you have subscription for ASUG or participated on the Teched 2012 you will have access to a presentation of one of the world’s largest consumer goods company, who was able to decrease the size of their SAP BW system from 80 GB to 30GB by applying a data governance strategy based on data temperature. Most of data after certain age, for certain business areas, are now sitting in the nearline storage. Actually, some of their InfoProviders are shrinking overtime, without impact to the business. That is a great example of sustainable data growth using data temperature to define storage tier strategy.

This entry was posted in Application ILM, Data Integration and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>