When It’s Time to Give Data Warehouse a Digital Makeover

Data WarehouseHow can data warehouses be reconstituted for today’s cloud-based, data-driven enterprises? The data warehouse has a place in this new world, but it takes on a new role and mission.

This unfolding scenario was recently explored by analysts with the Eckerson Group, who say data warehouses remain essential in many larger, complex enterprises. Data warehouses address challenges of enterprise reporting and analytics, and can be modernized to augment architectures supporting data lakes and mater data management. The advantage data warehouses provide come from ”pulling the data into a separate environment and organizing the data in a way that facilitates fast summarization, aggregation, and other types of queries allowed the data analysts to do their thing without impacting on-line application performance,” say David Loshin and Abie Reifer.

Other models of data aggregation for analysis may be compelling, but data managers need to think things through before moving away from an existing data warehouse environment, they state. “Are MapReduce queries really faster than SQL? Maybe yes and maybe no, depending on the configuration of the data and the types of requests,” they observe.

Ultimately, the emerging platforms that promise greater flexibility end up putting “transaction data into a separate environment and transform that data so that it is organized in a way to increase accessibility and speed performance. Isn’t that the same idea as the original approach to data warehousing?” say Loshin and Abie Reifer. “Yes, the performance is better, we can handle greater data volumes and a wider variety of sources. But we still have to think carefully about how we want to manage those data assets to support the downstream users’ needs.”

In a separate post, Dave Wells makes the case for moving data warehouses to the cloud. “Yes the data warehouse is alive, but it is not necessarily alive and well,” he points out. “With rapid growth of use cases, data sources, data types, and data volumes legacy data warehouses face many challenges: The data warehouse is alive and continues to be needed, but it also needs to be modernized.”

Moving to the cloud will provide much of the modernization needed for the new data warehouse. Cloud provides high levels of scalability, while “cloud elasticity eases the pain of workload management,” says Wells. In addition, “managed infrastructure shifts the burden of data center management and reduces data center costs.” In addition, “RDBMS in the cloud reduces database management complexity without the need to rebuild using NoSQL”.

However, migrating data warehouse to the cloud is no walk in the park, Wells cautions. He makes the following recommendations:

Provide clarity. “Start the planning process with a clear picture of the reasons for migrating your data warehouse to the cloud, Well states. “Begin with the business case. Then know your starting position, know your destination, map the path from beginning to end, and then navigate the course.”

Assess the current data warehouse architecture. “If the current architecture is deficient and struggles to meet current BI requirements, plan to redesign as you migrate to the cloud.”

Define the migration strategy. Avoid a “lift-and-shift” approach, Wells advises. “Changes are typically needed to adapt data structures, improve processing, and ensure compatibility with the chosen cloud platform. Incremental migration is more common and usually more successful.”

Select the technology. “Determine the cloud platform to which you will migrate. Then determine which migration tools you’ll need.”

Migrate and operationalize. “Plan the testing, then execute the migration process to move schema, data, and processing. Execute the test plan, and upon successful testing operationalize the cloud data warehouse and migrate users and applications.”

Comments

  • Aman Mahendra
  • David Loshin

    Joe, good summary of the discussion. I would add that moving to the cloud opens up a wide range of questions to be addressed regarding enterprise data utilization and exploitation. A modern “information intelligence” capability is going to span multiple system platforms, both on-premises and in the cloud, and will incorporate functionality from a growing palette of tools and services that the cloud providers make available. For example, one can maintain an on-premises RDBMS data warehouse for production reporting and dashboards, but the same data can be migrated to a cloud system that might also house open data sets accumulated from different sources that enables rapid querying and analysis using Spark for both SQL-style queries and more complex machine learning algorithms. Yet there are needs for solidifying the methods by which data are extracted from the on-premises system and synchronized with the cloud environment to ensure consistency in results and interpretation. It just goes to show you that even as the technology advances, there is still a need for fundamental data management and integration capabilities.