1

State Of The Data Warehouse

Hello and welcome to my first blog on Perspectives. I’m Krish Krishnan, and you may have seen me before. I have a channel on BeyeNETWORK on Data Warehouse Architectures and Appliances.

My Perspectives blogs will be focused on data warehousing as a practice and in the coming months, I will be publishing topics on the architecture, integration challenges and share some implementation tips on data warehousing. I welcome your feedback and hope to make this one of your go-to websites for information exchange and sharing insights.

Now I want to cover the State of the Data Warehouse

Business needs today mandate the availability of data at the right time to the users, to make effective decisions. This is the promise that the data warehouse was built on. But in the real world, the data warehouse has morphed into a “big” truth repository and the business value derived from the same is perspective based. What the current data warehouse lacks is a flexible architecture from a data management and integration perspective.

The processes to extract, transform and load data are rigid and purpose built to handle specific workloads. As data volumes grew, we moved to keep tuning the architecture. While this helped us in the beginning, the performance of the data integration layer is a bottleneck. This inflexibility is complicated further by the lack of a clear information lifecycle management policy. So we are left with huge data volumes and inflexible design constraints.

Consider the challenges that are coming to the data warehouse now and in the future:

  • Unstructured data – Text, Voice, Web (Forums), Social Networking Sites
  • Taxonomies / Ontology Integration – Multiple hierarchies
  • Enterprise Search – DW integration
  • Operational BI

What can we architects, designers, modelers and developers do? Well to answer that question, we need to start adapting to new techniques of data integration.

  • Design a reference architecture that is operational and analytical. If you are more driven by applications on the DW platform, web services and SOA will be excellent areas to expose data as a service (DAAS).
  • Satisfy the operational BI need through “Change Data Capture”. By using a platform approach, you can capture the changes and apply them in changed fashion or micro-batches throughout the day.
  • Improve the storage architecture. 80% of the data in your data warehouse is not used 100% of the time. Use architecting skills to deploy disk tier-ing in the SAN. This way you can move all the rarely used but essential data to a lower disk band. This will save you time and money. Even better you can integrate a data warehouse appliance and deploy unused data to the appliance.
  • As we move along this thought process, we can see the use of MapReduce in the data warehouse of tomorrow. This will enable the JAVA developers to get access to data more effectively. MapReduce is not supported by all platforms today, but my thoughts are in the next two years we will see it as a common interface.
  • Cloud computing will be a major growth area in the next decade. As architects or developers, the paradigm shift will be to extract and deploy your data on a platform that you will never see physically. There are several issues to be solved with cloud, such as privacy, security and encryption. All these apart, the real use of cloud will be for you to try and apply new techniques and methods, without investing the hardware.
  • Architect the data warehouse to address performance and service level requirements from day one. This will mean that you need to choose your data model and indexing technique to be extensible and flexible.
  • Data quality is essential, and has been addressed by my fellow authors on the Informatica Perspectives Site – Read the post from Dave Reed and Chris Boorman on this subject.

As you can see from this brief discussion, the future of the data warehouse is going to be a challenge and opportunity in itself for us practioners. I look forward to discussing more about each topic in the upcoming blogs.

FacebookTwitterLinkedInEmailPrintShare
This entry was posted in Cloud Computing, Data Integration, Data Quality, Data Warehousing, Enterprise Data Management and tagged , , , . Bookmark the permalink.

One Response to State Of The Data Warehouse

  1. Badri Narayanan C B says:

    Hi Krish, very useful subject. From my past experience I would like to add one more to become a challenge in future. I have worked on two types of applications, one is normal datawarehouse (decision making) and other one business critical applications where each row is being sold.
    In the first one, even when there are errors and rejections in the system, by approximating our objective was solved., ie decisioning.
    In the later one, every row counts and volume universe is around 100 million. In the later case very sophisticated interface to re-feed error records, correct them or override them at a ODS level may be required. This could pose a challenge to implement Informatica. What is your view.
    thanks, Badri. http://blogs.hexaware.com/informatica_way

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>