How to Get Big Data Integration Right the First Time

It’s actually hard to find good guidance out there when it comes to big data integration.  Everyone thinks they have the answer, but the experience is just not there, even as we watch big data integration technology appear.

One article that I recommend is by Richard Daley, entitled How to Be Successful with Big Data Integration.  In this article, Daley provides some practical guidance, including: 

“Overall, a big data integration solution will be most successful if it

  1. Lifts the major constraints around big data storage and data processing platforms so that, for Hadoop, there are no inherent delays in accessing data across large clusters of computers, and, for NoSQL databases, restrictions to querying data—such as the ability to sort and group data or perform joins—are removed.
  2. Eliminates the technical barriers—users need simple, easy-to-use, and high-productivity visual development interfaces for high-performance data input, output and manipulation regardless of which big data platform (from Hadoop to the NoSQL databases) they deploy.
  3. Facilitates integration with enterprise data—even big data cannot thrive on its own.
  4. Also delivers a complete business analytics solution that includes everything from reports, dashboards, interactive visualization and exploration, and predictive analytics.”

Expanding on this a bit, I would say that what worked with traditional data integration approaches and the technology in the past are just as important when the data is, well, big data.  Make sure you don’t throw the baby out with the bath water, and thus insure that you’ll solve most of the traditional data integration problems such as data semantic mediation, data model mediation, data quality, data validation, etc..

However, there is something to be said about technology that’s specifically designed for big data systems.  This means technology that provides built-in features and functions that deal with massive volumes of data, is able to manage the movement of the data, and, finally, deal with information that may not be bound to structures.

What this all means to me is that:

  • Big data integration means leveraging approaches and technology that are much more flexible and scalable.  This also means changing methodologies and approaches for how we approach big data integration.  These build upon traditional approaches and methods, and do not replace them.
  • We need to consider upgrades to our data integration technology, including solutions specifically designed to deal with the scale and lack of structure.   These are emerging now, and they are worth a look.
  • The value of this technology far exceeds the value of waiting for the hype to settle down.  Those who do not leverage big data technology and big data integration today will find that they have much to catch up to later.  Remember, retooling typically and quickly raises the risk of failure.
  • This is strategic technology.  We should be treating it as such.

Some will understand how to drive forward with big data integration, but most will need some education.  Perhaps use this blog as a jumping off point to understand what’s changed, and, perhaps more importantly, what’s not.

This entry was posted in Big Data, Data Integration and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>