How to Build Composite Solutions on Microsoft Azure Cosmos DB With Informatica Intelligent Cloud Services

Microsoft’s Azure Cosmos DB provides the enterprise tremendous capability when building globally available applications, especially when low latency, consistency, and redundancy are critical success factors. But as with nearly any globally available enterprise solution, it usually requires a composite set of technologies to work together in harmony. Solutions built with Azure Cosmos DB will most likely have many different sources of data, and downstream analytics applications will have yet another set of requirements.

In this blog, we’ll explore using Informatica Intelligent Cloud Services (IICS) to deliver this composite solution centered on Azure Cosmos DB. Let’s consider building a globally available customer portal with a complementary mobile experience. This portal needs to provide visibility into a customer’s order and support history, which is fed by a company’s ERP systems with an underlying Oracle database. It also needs to provide embedded analytics, which will be driven by Power BI and Azure SQL Data Warehouse.

Fig 1: High-level architecture

There are many areas in the above architecture where Informatica can add value, but for now let’s focus on the data movement from the Oracle database into Azure Cosmos DB as well as keeping the Azure SQL Data Warehouse driving the embedded analytics up-to-date.

Oracle to Azure Cosmos DB

Moving data from Oracle to Azure Cosmos DB involves far more than copying data from one database to another. Yes, one could do that and keep the normalized structure of the Oracle database intact, but this may put the desired success factors at risk. While relational database data structures are heavily normalized to avoid redundant storage at the cost of compute, this approach isn’t ideal for storing data in Azure Cosmos DB. In the case of Azure Cosmos DB, it is advised that the data be denormalized so that it can be optimized for compute by modeling the data to fit a workload’s data access patterns and reducing the need for redundant storage. By having the data denormalized, the database is able to exploit spatial locality and guarantees that the data is “nearby,” such that related data doesn’t need to be fetched in a follow-up database request, ensuring the greatest levels of performance.

This pattern can easily be handled with IICS using Cloud Data Integration and the Oracle Database and Azure Cosmos DB connectors.

For this scenario, let’s assume that the customer portal developers have defined the target Azure Cosmos DB data structure with an optimized hierarchy. Using the Hierarchy Builder, IICS Cloud Data Integration makes it easy to map from one or more relational sources and load the target hierarchy in Azure Cosmos DB, all without writing a single line of code.

Fig 2: Drag and drop from relational to hierarchy targets

In the above example, we can see that IICS Cloud Data Integration is being used to first join two relational tables and then load into the optimized hierarchical structure in Azure Cosmos DB.

Azure Cosmos DB to Azure SQL Data Warehouse

In the previous section, we looked at how to use IICS to optimally load data into Azure Cosmos DB in order to take the most advantage of what Azure Cosmos DB has to offer. Similarly, if we consider the original premise that most globally available enterprise applications require a set of composite technologies, we’re now going to look at how to ensure an optimal analytics experience built on Microsoft’s Power BI and Azure SQL Data Warehouse with Azure Cosmos DB as the primary data source.

Analytics can work against operational data stores, like the Oracle database underlying the ERP system or directly against Azure Cosmos DB supporting the customer portal application, but this approach may deliver limited insights, have slow performance, and impact the performance of the applications using those databases.

In order to deliver an optimal analytics experience in this scenario, the hierarchical data from Azure Cosmos DB can be extracted and loaded into Azure SQL Data Warehouse, which is designed to support very large-scale analytics use cases.

Similar to loading Azure Cosmos DB, customers can again leverage IICS Cloud Data Integration, Intelligent Structure Discovery, and the Azure Cosmos DB and Azure SQL Data Warehouse connectors, but in reverse; extract hierarchical data from Azure Cosmos DB and populate the fact and dimension tables in Azure SQL Data Warehouse to drive the Power BI analytics. Again, no coding required.

More than data integration

The reality is that this blog can be broken up into a series of blogs to do each topic justice. Each project typically begins with data, so it only makes sense to talk about how to optimally deliver the data between a composite set of technologies to deliver an outstanding user experience. But the IICS platform does so much more than deliver data, it can deliver high quality data that is “scrubbed” by Cloud Data Quality or protects data via obfuscation with Data Masking while preserving referential integrity (which then of course ensures accurate analytics that are compliant with privacy regulations).

Also, while the architectural diagram in Figure 1 looks clean and simple, we all know that in reality it’s messy; architectures are distributed across cloud and on-premises boundaries, running on different platforms, distributions, and releases, behind firewalls and proxies, etc. Data integration in development is vastly different than in production, and IICS delivers a first-class hybrid experience designed to cope with these real-world deployment challenges ensuring that, in alignment with our original goal, a composite set of technologies work in harmony to deliver for the customer.

Come learn more about how Informatica Intelligent Cloud Services can help drive your Azure Cosmos DB and other Microsoft Azure initiatives at Microsoft Ignite in Orlando. Please visit booth #1406 and watch Andrew Liu from Microsoft and Bill Creekbaum from Informatica present a joint solution during the “What’s the latest with Azure Cosmos DB” session (BRK2063) on Tuesday, November 5 at 10:30 am EST.

For more information about using Informatica Intelligent Cloud Services with Cosmos DB, check out this great resource from the Informatica How-To Library: “Converting Relational Input into Hierarchical Output using Microsoft Azure Cosmos DB SQL API Connector.”