Sometimes when I drive past an electronic tollway collection sensor, I wonder about the amount of data it must generate. I’m no expert on such technology, but at a minimum, the RFID sensor has to read the chip in your car, and log the date and time plus your RFID info, and then a camera takes a picture to catch any potential violators. Now multiply that data times the hundreds of thousands of cars that drive such roads every day, times the number of sensors they pass, and I’m quite sure this number exceeds several million messages per day.
Then, of course, these messages must be sent – via TCP, say – to some geographically-distant server, where they can be organized and stored on permanent disk for further processing. Later, a series of large applications will access these records to send out requests for payment and other transactions, or to transform the data in some way, perhaps to cross-check these tollway collection data points with vehicle registration records.
This is but one real-world example of “Big Data in Motion”: lots and lots of detail records, streaming in real time, that must be efficiently collected and stored for viewing and analysis, possibly even in near-real-time. And this is just one highly-visible example among thousands in today’s information-saturated world. Many of these systems are now hitting, or will soon hit, their built-in limitations for scale, performance, and flexibility.
It works, yes. But it could also have worked this way back in, say, 1992. Way back, before the explosion of the Internet and divergent data types and platforms, before JMS and web applications, before the need for intra-day information to make faster decisions and reconcile errors sooner.
We are using very old technology to tackle a very new problem. There must be a better way.
Now, with Informatica® Ultra Messaging® and B2B Data Transformation, you can streamline and simplify Big Data in Motion processes like this in a variety of ways.
For instance, capture and archive raw sockets data – such as TCP or UDP data from many remote or mobile devices, using the Raw Reception Modules from the Informatica Marketplace (see below). Subsequently, in near-real-time, one or more “subscribing” applications – such as a JMS application – can retrieve or replay this raw data using a topic name, just like with any other UM topic-based message (the topic name is unique per raw TCP or UDP sender). Or, leverage the power of Power Center by connecting it with the JMS API in Ultra Messaging.
There’s no need for a big, slow, expensive database to house saved data, because the archive function of Ultra Messaging Cache Option uses a lightweight internal database that is tuned for high performance, even with the very high volumes often found with streaming data.
And because of the shared API (JMS, Java, .NET, C/C++) of Ultra Messaging, the data is more easily accessible, directly from a wide variety of applications, and across all messaging use cases, from streaming (reliable) to persistence (guaranteed) and queuing (once-and-only-once).
If you also need to transform this data, you might add the Data Transformation Processing Module (see below). The data is transformed just before the archiving step; thus, the transformed data is available for immediate consumption by any number of applications on the Ultra Messaging bus. Use the B2B user interface to define the transformation, or use one of the many predefined standards and protocols across many industries.
Still another application of these technologies is High Performance Data Analytics (HPDA), which adds an MPP database to the mix to provide intra-day access to key business analytics. Instead of waiting overnight, or even a few days, you can now gain access to vital information much quicker, reducing your risk and increasing business agility to make better and faster decisions.
The big picture is that the JMS API at the back end, and the ability to ingest and archive raw sockets data at the front end, make for a powerful combination.