Fast and Furious: Designing for the Fast Data Lane Collecting Streams
Switching to the carpool lane can be intimidating especially when you don’t have all the information available to you to safely make the lane change. For example, is the carpool lane open, is my blindspot clear to make the lane change? How do you efficiently collect bits and pieces of data at the right time to make the best-informed decision when switching data lanes?
Messages & events from cloud, IoT and Mobile devices are generated in real-time and at tremendous rates. Each of these event sources generate small pieces of data that accumulate into large volumes of data.
As events are increasingly produced and consumed outside a corporate data center and collected by IoT gateways, it makes sense to efficiently process events at the edge of the network. This concept known as “Edge Computing” enables connectivity directly to the devices or an IoT gateway via various protocols and allows for computations to be performed on streaming data such as- parsing, filtering, aggregation and delivering the data upstream to corporate data center or sending control signals back downstream to the device by the way of alerts, rules or triggers.
In any real-time analytics journey, stream processors play an important role by facilitating the two-way computing streams between the device (or the IoT gateway) and the data center and have ability to perform edge analytics. The collection of event sources is a key feature of stream processors, as it establishes data points about the source system and can describe its behavior, but it doesn’t end with stream data collection. Stream processor’s must apply basic transformation such as parsing or aggregation to data-inflight and guarantee delivery, not only back to the device but to various targets such as Kafka, Hadoop, Complex Event Processing.
Going back the carpool lane example, if the vehicle is equipped with a collision detection system, when a driver is about to change lanes, an alert is triggered if the vehicle is too close to another vehicle or object. This is a great example of edge computing as event data is immediately processed and sent back as an alert to the driver.
As you switch to the carpool lane, consider Informatica’s VDS stream processor, a distributed scalable system that collects all forms of streaming data at high rates that accumulate into large data volumes which can be analyzed and acted on while it’s still fresh and relevant. The key factor to success in delivering a real-time analytics solution is the ability to derive value from events as it happens allows for quicker reaction time that can affect the event outcome before complete.
When assessing a stream processor, look for a brokerless pub-sub messaging system that sources data from a variety of system which can scale to process millions of records per second and eliminates single point of failure, and doesn’t require intermediate storage or multiple hops to guarantee record delivery.
As your journey continues in the fast data lane, in the next blog, we will look at how stream transportation system, such as Apache Kafka, integrates into real-time data pipelines that provides a mechanism to transport data in transit.
This blog is part of the series on stream processing and analytics. Catch the series: