Timing Is Everything (for Data Integration)

Data IntegrationOne of my favorite quotes (often attributed to Albert Einstein) says “The only reason for time is so that everything doesn’t happen at once”. I have come to think about this saying many times in my personal life, but recently I have come to think about it while helping one of our customers define their roadmap toward digital transformation. This customer, similar to many other enterprises, is struggling to integrate data that has different timing patterns – some things that happen now needs to be handled later while others require immediate action.

Today’s work environment requires communication in different types of “time domains”. For example, my typical morning starts with going through emails and replying to them – many times after conducting some internal discussions. Later I often find myself attending conference calls with multiple participants where I have to provide feedback in real time. Over lunch I might reply to an instant message from a college that needs a quick reply to handle another issue.

All of these happen in multi latency – which is a challenge that is extremely relevant in modern data environments:

  • Handling emails is in a way a type of “batch” Aggregated items from multiple time domains are handled at an asynchronous manner and in “low latency” (which for me means typically in the mornings and then a few times later during the day)
  • Conference calls are real-time data communications between parties all of which are actively engaged during the call. If I watch a recording of someone’s else call at a later time this reverts back to “batch” mode.
  • Messaging (SMS or other) are event-driven communications – short and specifically triggered for a relevant query. They are commonly used for a quick transfer of small amounts information (though I know some people that can share extremely long stories via SMSs ;-))

An integration hub is modern integration solution that’s all about synchronizing between data applications and system at the right time

  1. The integration hub transient persistence layer allows it to be extremely useful for batch processing. Data can be published to the hubs “Topics” and consumed by other applications at a variety of different latencies with elaborated scheduling mechanisms.
  2. The integration hub also handles real time data use cases:
    • Using streaming engines connectivity (such as Kafka) and utilizing a Big Data Streaming engine – allows for real time data to be published into a persistence and read at lower latency at any time
    • An effective integration Hub should also support Kafka integration with the ability to create Kafka Topics and monitor Kafka producer and consumer flows
  3. Data Driven integration should be performed using new types of publications and subscriptions which allow applications to write or read from the hub with a direct REST API

In the rapidly changing world, successful data-driven digital transformation requires a modern approach to integrating complex data ecosystems. Learn more about the Informatica Integration Hub, and how it enables you to orchestrate, unify, govern, and share your data and allows you to seamlessly manage mixed latency data delivery: data driven, real-time or batch, to all systems.