Predicting the Future Based on Big Data – Science Fiction or Fact?

Predicting the Future Based on Big Data – Science Fiction or Fact?

Over 65 years ago, noted futurist and sci-fi writer Isaac Asimov wrote the Foundation series – a trilogy of books that had, at their premise, a protagonist mathematician who developed a new branch of math called psychohistory that can predict the future, but only on a large scale. The cornerstone to the math was – the bigger the data set, the more predictable the future. And it makes sense – If the capabilities to capture and access all relevant data grow completely, and the data itself represents complete reality, it negates any such need to model your business. In other words, the data speak volumes—when they’re in sufficient volumes to matter.

In many ways, companies are on the road to doing this, but the hitch has been speed and general ability to see correlations or ‘the bigger picture” (and then being agile enough to respond to the analytics and recommendations for quick decisions based on the analytics.)

The interesting thing is that now, 65 years after Asimov, Informatica and other technology companies have made collecting and integrating big data much more effective with new tools that store, integrate, govern, and secure large data sets in real time up to the point that companies can capture all of their transactions, activities, interactions, and events that comprise their business. Simply put, if you are a Bank and 3 million “things” happen during the business day, you can capture and use all 3 million things with today’s technology.

Now with Informatica Big Data Streaming Analytic solutions, you can take advantage of important business situations that you discover then using that awareness implement better and faster decisions. Informatica has been developing software products and capabilities in this field which is very clearly in the sweet spot for the future,

How does it work?

A Big Data Streaming Analytics solution must accommodate multi-latencies that scale for big data and deliver information as needed per SLA’s. Not every data pipeline can cost-effectively scale to sub-second latencies. Therefore, a Big Data Streaming Analytics platform must support multi-latency processing engines that can support a variety of requirements related to complexity, speed, accuracy, throughput, reliability, and cost.

Informatica provides the first and most comprehensive Big Data Management platform so all types of data of any size can be processed at any latency. This is a critical requirement for Big Data Streaming Analytics where information needs to be delivered efficiently at the right-time. In today’s world of Big Data Streaming Analytics the platform must be able to access and ingest all types of data, not just event streams otherwise the analytics are incomplete and less powerful.

Informatica receives data streams through built-in stream sources which may be deployed as separate engines that are co-located with the streamed source. Vibe Data Stream (a high-speed big-data ingestion product streams, in real-time, tens of thousands of records per second into Big Data platforms like Hadoop. Rules and streaming operations are executed in real-time. Developers may create source/target definitions, analytics definitions, data definitions, and stream processors by using the native streaming language or business rules.

An important point of differentiation is that Informatica allows business users to create business rules (depending on the level of technical knowledge of the user), but more often, the business user will utilize Informatica wizards and templates for defining processing logic. This eliminates the need for a business user to understand the Informatica technical details, and allows them to focus on the business logic.

Informatica provides hundreds of pre-built data integration transformations that execute at scale and at multiple latencies to increase productivity and ease of maintenance. Streaming analytics has been criticized as producing too many false positives therefore Informatica delivers data quality to ensure the operationalized insight is delivering accurate information. With Big Data Streaming Analytics there are many multi-latency data sources from disparate systems. In order, to do analytics on combined data sets Informatica provides sophisticated matching algorithms to link entities (e.g. customers, products, components, etc.) for analysis at scale and at multiple latencies.

If this is an interest for your organization, “Get back to the future”, and check out our real-time integration products on our web site at and keep an eye out for a new set of webinars and announcements coming in 2016.


  • Jim Judge

    Very smart for adding templates and wizards to assist with processing the customer’s data! The easiest to use tool to accomplish complex tasks is usually the most adopted.