How to Get the Biggest Returns from Your Hadoop and Big Data Investments in 2015
2014 was the year that Big Data went mainstream from conversations asking “What is Big Data?” to “How do we harness the power of Big Data to solve real business problems”. It seemed like everyone jumped on the Big Data band wagon from new software start-ups offering the “next generation” predictive analytic applications to traditional database, data quality, business intelligence, and data integration vendors, all calling themselves Big Data providers. The truth is, they all play a role in this Big Data movement.
Earlier in 2014, Wikibon estimated the Big Data market is currently on pace to top $50 billion in 2017, which translates to a 38% compound annual growth rate over the six year period from 2011 (the first year Wikibon sized the Big Data market) to 2017. Most of the excitement around Big Data has been around Hadoop as early adopters who experimented with open source versions quickly grew to adopt enterprise-class solutions from companies like Cloudera™, HortonWorks™, MapR™, and Amazon’s RedShift™ to address real-world business problems including:
- Fraud detection in financial services and eCommerce by analyzing the entire population vs. a sample leveraging non-conventional data including call logs, web logs, social, with payment and transaction data across all time
- Customer sentiment analysis to identify customers that are likely to leave and shop with a competitor in retail, telecommunications, and healthcare by integrating transaction data with social interactions in real-time
- Improved risk management across the enterprise by consolidating and analyzing credit, market, and operational risk data from all areas of the business vs. segmented views limited by traditional database technologies
Architecturally, most Hadoop implementations involve leveraging clusters of commodity servers running an instance to process large sets of data in front of a traditional data warehouse. Hadoop is known for its ability to process massive amounts of structured and unstructured data via a distributed architecture leveraging commodity hardware. It provided faster and cheaper platform to analyze vast amounts of data than traditional relational database technologies.
Hadoop itself, however, does not have native analytic or intelligence capabilities. The “Analytics” or “Intelligence” are often statistical or data mining models built by data scientists using applications like SAS™ or “R” executed in Hadoop against all that data. Only the results are then extracted into a downstream data warehouse for business intelligence, campaign management, or reporting solutions to leverage. The end result is faster information and a fraction of the costs compared to running these processes or models inside a traditional data warehouse architecture. Though Hadoop provides great opportunities for businesses in any industry, harnessing its power requires having capable and scalable data integration and data quality foundation to deal with those Big Data Volumes, Variety, and Velocity which has become the household definition for Big Data.
Let’s start with getting all of that data into Hadoop. Not only is data getting larger in volume, so are the number of source systems that create all this data. Data in various formats, structures, and types that has to be transformed, formatted, and validated before it can be used inside of Hadoop. Integrating data using native Hadoop programming languages including PIG, MapReduce, etc. requires specialized developers that maybe difficult to find resulting in higher development costs and longer project cycles.
Equally important is to ensure your data is correct before you can leverage those wonderful models developed by your data scientists. Like the old saying goes however, “Rubbish In…Rubbish Out”. You can ask any data scientist or analyst where they spend most of their time and they will tell you it’s making sure the data is right before they feed their data models. According to an Elder Research study, data modelers and scientists admitted they spend between 60% and 80% manually preparing and cleansing data that is made available from IT. Despite years of focus on improving data quality in the enterprise, many firms large and small lack proper data quality processes and technology to help automate identifying, fixing, and monitoring data quality processes from the source systems that create the data to downstream analytic applications that require good data for business use.
In summary, 2014 was the year where Big Data became mainstream, however 2015 is when Big Data and technologies like Hadoop will require a capable and scalable data integration and quality management foundation to perform.
Are you ready and prepared to realize those big returns from your Hadoop and other Big Data investments?
Click here to learn more about Informatica’s Solutions for Big Data and Hadoop.