Tag Archives: plan
I had the opportunity to review and comment on the draft of a new Hadoop technical guide. It’s great to see the published paper: Technical Guide: Unleashing the Power of Hadoop with Informatica. This guide outlines the following five steps to get started with Hadoop from a data integration perspective.
(1) Select the Right Projects for Hadoop Implementation
Choose projects that fit Hadoop’s strengths and minimize its disadvantages. Enterprises use Hadoop in data-science applications for log analysis, data mining, machine learning and image processing involving unstructured or raw data. Hadoop’s lack of fixed-schema works particularly well for answering ad-hoc queries and exploratory “what if” scenarios. Hadoop Distributed File System (HDFS) and MapReduce address growth in enterprise data volumes from terabytes to petabytes and more; and the increasing variety of complex multi-dimensional data from disparate sources. (more…)