Tag Archives: HBase
Enterprises use Hadoop in data-science applications that improve operational efficiency, grow revenues or reduce risk. Many of these data-intensive applications use Hadoop for log analysis, data mining, machine learning or image processing.
Commercial, open source or internally developed data-science applications have to tackle a lot of semi-structured, unstructured or raw data. They benefit from Hadoop’s combination of storage and processing in each data node spread across a cluster of cost-effective commodity hardware. Hadoop’s lack of fixed-schema works particularly well for answering ad-hoc queries and exploratory “what if” scenarios.
Doug Cutting may have the world’s most famous green-stuffed-elephant toy. While named after his son’s plaything, Hadoop is maturing and entering the business world, creating significant opportunities as well as challenges. “Hadoop is on a fast track to becoming the world’s pre-eminent scientific analytic platform”, notes Forrester Research Senior Analyst James G. Kobielus (Forrester Blogs, June 7, 2011).
This is the first of a series of Hadoop articles I’ll write for Informatica Perspectives. My focus for this series is to guide an existing or prospective user of Apache Hadoop on best practices and tips so that organizations can become more data centric. After participating in Hadoop user communities, both local and virtual, for the last several years, I’m happy to share from work with Hadoop pioneers and practitioners both innovative use cases and “areas to watch out for” in deploying and integrating Hadoop as part of a broader enterprise data architecture. I also bring a user perspective as a certified Hadoop system administrator. (more…)