It’s Spring Time. Are You Harvesting Your Fast and Fresh Streaming Data?

Big Data solutionsIn some parts of the world, like in California, it’s officially Spring time. And if you’re like me, you’re planning your trips out to the nearest national parks, where you’re likely to see snow melting into fresh streams of water. Ahh the sight, the sound, and the motion of streaming water – feeds your soul for sure. Such is the power of streams.

As can be the power of streaming data, that feeds real-time insights into your business. More specifically, with streaming data your organization can:

  • Fight fraud in real-time
  • Rollout an offer to a customer about to churn
  • Provide dynamic price reduction
  • Draw potential customers into a nearby store
  • Monitor patient data in real-time

There are many more uses of real-time streaming data across industry verticals such as Financial Service, Insurance, Telco, Retail, Healthcare, Energy, and Public Sector.

Then there’s Batch Data

In the digital world, you’re increasingly looking at data-driven decisions to get ahead of your competition. Your knowledge workers such as data analysts and data scientists need the data fast and fresh for their analysis and thereby making it available for real-time decisioning. Organizations across industries have a variety of data coming in at various speeds – batch and streaming – at a high volume.

For example, when a customer who opted-in for marketing communications enters a store, you can identify her using camera and beacons, then trigger a real-time offer based on the web search and/or purchase history of the customer. Loyal customers are part of a batch data source (like customer 360) and beacons are a streaming data source. To make an offer in real-time to loyal customers, as in this example, you must combine data at various speeds or latencies – in this example, batch and streaming. 

Multi-latency data management is a critical capability to harvest fast and fresh streaming data and gain a competitive business advantage.

First, a bit of a history lesson.

Streaming data processing was a specialized solution, which led to the concept that streaming solutions need to be separate from batch data processing solutions. This also implied that traditional ETL/ELT solutions couldn’t address streaming data management (ingestion and processing). Certain streaming-only solutions in the market advocated batch processing is not required and went as far as saying batch is a special case of streaming.

Open source Apache Spark provided DStream (Discretized Stream) to help process streaming data in micro-batches. DStream required a separate streaming specific implementation, that resulted in separate implementations for batch and streaming, making it expensive and that stream data processing is different.

Let’s fast forward to today.

Apache Spark introduced Structured Streaming, as beta in Spark 2.0 and generally available in Spark 2.2, to overcome the challenges of DStream. Spark Structured Streaming that treats stream as an infinite table, brings support for event-time and out-or-order (delayed) streaming data processing as well as tight integration with batch data.

When we speak with our customers, we hear majority of the enterprise data is batch. So streaming data management alone won’t help you solve for real-time use cases such as the ones we discussed earlier.

How does Spark Structured Streaming help with Multi-latency Data Management?

Big data solutionsStructured Streaming attempts to unify streaming, interactive, and batch queries over a structured abstraction – which is why it is called “Structured Streaming”. It also provides streaming data processing specific capabilities such as event-time based processing of the data, support for ordered processing of late arrived events, and de-duplication of the streaming data for “exactly-once” semantics.

Come to our Spring Launch to Learn More

To learn more about how Informatica’s AI-driven Intelligent Big Data solutions help you harvest your fast & fresh streaming data and to hear from the Chief Data Officer of OVO, register for one of our virtual launch events:

I look forward to seeing you at the Spring launch event!