Extend, Migrate and Modernize Your Data Warehouse with Google BigQuery

Google’s BigQuery is increasingly being selected by enterprises to drive their data warehouse modernization initiatives. It’s no secret that Google BigQuery provides extreme scale and extreme performance for the enterprise, but modernizing your data warehouse requires more than just compute horsepower and unlimited storage; analytics modernization is a journey fueled by data.

I’ve worked with many customers over the years on their journey towards the modern data warehouse and observed three distinct phases that lead to successful outcomes: Extend, Migrate, and Modernize.

So let’s consider a baseline scenario: an enterprise customer has been a longtime user of Netezza and now, faced with its upcoming end-of-life, has chosen Google BigQuery as the foundation of their future data warehouse. Let’s start their journey.

Extend: Deploy New Analytics Use Cases to Google BigQuery

When an IT organization is presented with a new use case to support, a great way to begin the journey is to deploy the new use case in the cloud. Continue to source data into the on-premises Netezza data warehouse but replicate their data pipeline to feed Google BigQuery and build out the new use cases against BigQuery so that any new development is against the future state, not legacy environment.

Extension has the added benefit of allowing the IT organization to be responsive to the business while not continuing to further invest in the legacy infrastructure, but rather invest in the future state. Further, it begins to provide the IT organization the practical and operational experience of Google BigQuery which will make the next two phases more efficient and successful.

While the actual process is of course more involved, from a conceptual perspective, customers will start by seeding their Google BigQuery environment with an initial data load from the on-premise Netezza environment and the implement a replication service to ensure that Google BigQuery is always-up-to-date to support the new use case.

Once these two high-level steps have been completed, the new use case is ready to be developed against Google BigQuery. The first step of the journey has been taken!

Migrate: Move Strategic Workloads to Google BigQuery

Once a foundation has been established with Google BigQuery during the Extend phase, it will be important for an IT organization to perform an inventory of the workloads that depend on the Netezza data warehouse. Further, its more than simply taking inventory of the use cases dependent on Netezza but building a detailed data flow map that identifies the paths of all source data (lineage) and how the data flows to downstream applications (impact analysis). Understanding how the data flows by individual use case will enable the IT team performing the migration to do so with the greatest levels of use case data and with all dependencies mapped out.

With this level of visibility, the IT organization can move the workloads confidently to Google BigQuery knowing that they have prepared and planned the move so that all downstream and upstream dependencies will function properly as they connect to Google BigQuery, ensuring smooth transition and minimal downtime as existing analytics and applications dependent on Netezza are now redirected to Google BigQuery.

Modernize: Leverage Benefits of Google BigQuery

Once the center of an enterprise’s data gravity has shifted to Google BigQuery and strategic workloads, analytics, and applications have been connected to Google BigQuery, the IT organization now has the opportunity to evaluate the unique strengths of BigQuery and begin to develop a plan to modernize their applications to take the best advantage of Google BigQuery.

For example, with Google BigQuery’s RECORD data type that collocates master and detail information in the same table, customers can load nested data structures (e.g. a JSON file from a REST web service) to a single BigQuery table while continuing to use SQL which provides tremendous compatibility with countless data tools and technologies while benefiting from the extreme performance improvements of Google BigQuery’s columnar technology.

Further, for new workloads with extreme volume, such as IoT telemetry, customer’s may choose to capture the raw data in Google Cloud Storage and curate with Google Dataproc before loading into Google BigQuery to optimize the overall data flow with Google technologies optimized for the workload.

Google BigQueryEvery enterprise’s journey will be different, but the Extend, Migrate, and Modernize approach ensures responsiveness to the business from the very beginning while focusing on maintaining business continuity and taking advantage of the Google BigQuery platform. With Informatica, you can seamlessly automate integration and management of your data across Google Cloud, SaaS and on-premises systems to unleash the power of data. Accelerate your Google Cloud analytics modernization with Informatica. Get more value from your data and boost your data-driven digital transformation.

Visit us at Google Cloud Next ’18 in London on October 10th and 11th to learn more about how to speed your journey to cloud

Learn more about Informatica for Google Cloud

Comments