Data Quality and Big Data

In the world of big data, getting access to data and making sense of it is often times a more important consideration than managing sheer volume itself.  Companies that are successful in unlocking true value from big data open themselves up to a world of insight for better understanding of things like customer preferences, satisfaction and regional purchasing differences. Doing this obviously is often harder than it seems due to the variety of information itself, leading to standardization and duplication issues.  Ownership is often an issue as well, with departmental lines being the most common constraint to sharing important data across the enterprise.

A recent eMarketer report showed that for U.S. based marketers at least, these are indeed common big data challenges. Among other things, there’s a wide range in the data types marketers generally collect (demographic data, 74%, transaction data, 64% and usage data 60%) and that sharing of that information across departmental silos is generally nonexistent (51% state that lack of sharing hinders their ability to determine the ROI of marketing). Not surprisingly, these same marketers are largely not leveraging big data as is noted in the report, so it could be argued that the variety and accessibility of information are direct causes of that.

Clearly, without the ability to first get access to key data, the value to be gathered out of big data itself will be very low. As the eMarketer report calls out, one source of this is often the lack of sharing of information across departmental lines. To a large degree, this problem is not just inherent for big data but also across traditional sources of information as well. To overcome these hurdles, many organizations are turning to data governance to help align key business stakeholders across functions with the underlying technologies they need to get their job done.  This behavior doesn’t change with big data and organizations need to consider ways to feed big data considerations into their governance practices. From a technology point of view, they will clearly need the right capabilities to support those efforts. In particular, these teams need the right set of capabilities to help drive business/IT alignment, data stewardship and the definition of enterprise data.

Directly related is this is the ability to better make sense of the wide variety inherent with big data.  Departments regularly operate with varying definitions of key business entities, and get locked into using them in a context that’s only useful to their specific business needs. As organizations look to leverage information across departmental lines, these inconsistencies perpetuate the problem and result in no sharing at all. As mentioned above, governance helps in aligning the needs of multiple stakeholders, across departmental lines. The complexity of Facebook and LinkedIn posts or Twitter feeds only complicates things further.

Fortunately, there are solutions to help address all of this. Informatica Data Quality is one and is designed to address the needs of any enterprises data challenges, including big data. With capabilities for discovering data domains, natural language processing to aid in parsing unstructured text and stewardship capabilities to ensure success with data governance Informatica Data Quality is ready to address the needs of your big data solution.

These topics and more will be discussed in detail at Informatica World 2012.  If you haven’t already done so, take a look at the current agenda and consider joining us for what will prove to be an informative event.  See you in Las Vegas!

This entry was posted in Big Data, Data Governance, Data Quality and tagged , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>