Tag Archives: Big Data Streaming Analytics Forrester Wave

Data Streams, Data Lakes, Data Reservoirs, and Other Large Data Bodies

data lake

Data Lake is a catchment area for data entering the organization

A Data Lake is a simple concept. They are a catchment area for data entering the organization. In the past, most businesses didn’t need to organize such a data store because almost all data was internal. It traveled via traditional ETL mechanisms from transactional systems to a data warehouse and then was sprayed around the business, as required.

When a good deal of data comes from external sources, or even from internal sources like log files, which never previously made it into the data warehouse, there is a need for an “operational data store.” This has definitely become the premier application for Hadoop and it makes perfect sense to me that such technology be used for a data catchment area. The neat thing about Hadoop for this application is that:

  1. It scales out “as far as the eye can see,” so there’s no likelihood of it being unable to manage the data volumes even when they grow beyond the petabyte level.
  2. It is a key-value store, which means that you don’t need to expend much effort in modeling data when you decide to accommodate a new data source. You just define a key and define the metadata at leisure.
  3. The cost of the software and the storage is very low.

So let’s imagine that we have a need for a data catchment area, because we have decided to collect data from log-files, mobile devices, social networks, from public data sources, or whatever. So let us also imagine that we have implemented Hadoop and some of its useful components and we have begun to collect data.

Is it reasonable to describe this as a data lake?

A Hadoop implementation should not be a set of servers randomly placed at the confluence of various data flows. The placement needs to be carefully considered and if the implementation is to resemble a “data lake” in any way, then it must be a well-engineered man-made lake. Since the data doesn’t just sit there until it evaporates but eventually flows to various applications, we should think of this as a “data reservoir” rather than a “data lake.”

There is no point in arranging all that data neatly along the aisles because when we get it, we may not know what we want to do with it at the time we get it. We should organize the data when we know that.

Another reason we should think of this as more like a reservoir than a lake is that we might like to purify the data a little before sending it down the pipes to applications or users that want to use it.

Twitter @bigdatabeat

Share
Posted in Architects, Big Data, CIO, Cloud Data Integration, Cloud Data Management, DaaS, Hadoop, IaaS | Tagged , , , , , | Leave a comment

Riding The Wave – Forrester Style

wavepic1

 

Forrester Research, a leading independent analyst firm just released a new Wavetm report about Big Data Streaming Analytics Platforms and in it, Informatica was designated a Leader.

This is exciting news for a number of reasons.

– Personally, as a product leader focused on some of our newer technologies at Informatica, this is a positive sign of wider acceptance in the market.  One might argue it’s the start of a mainstreaming process.  Analyst firms don’t usually release these reports unless there are clear signs of a critical mass of customer interest (among other criteria).  And indeed, in their report, the authors cited Forrester survey data that revealed firms’ use of Streaming Analytics increasing 66% in the past two years.

– To validate product and vendor qualifications, Forrester conducted reference calls with current customers, so thank you to our customers for feedback you’ve provided to the analysts about our Big Data Streaming Analytics platform. It means a lot.

– The authors make an important point in the report when they write “Streaming Analytics is anything but a sleepy, rear-view-mirror analysis of data.  No, it is about knowing and acting on what’s happening in your business – now…The high velocity, white water flow of data from innumerable real-time data sources such as market data, Internet of Things, mobile, sensors, click stream and even transactions remain largely unnavigated by most firms.  The opportunity to leverage streaming analytics has never been greater”  We would agree.

– Finally it’s been an area of importance, investment, and diligent work (aka Blood Sweat and Tears) for Informatica for a while now. This really validates for us that we’ve been carving our surfboard in the right direction and now we are totally stoked that we’ve caught a righteous gnarly wave.

So while we’ll celebrate this accomplishment for today, the work really begins now…

To read the full report, The Forrester Wave™: Big Data Streaming Analytics Platforms, Q3 2014,  hang loose and surf on over here


 

Share
Posted in Big Data, Complex Event Processing, Real-Time, Ultra Messaging | Tagged , , , , , | Leave a comment