Tag "Apache Spark"
“In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log, they didn’t try to grow a larger ox. We shouldn’t be trying for bigger computers, but for more systems of computers” — Grace Hopper Data storage and processing technologies has gone through dramatic transformation from pre-stage flat-file system... more
In this third and final part of this blog post, I provide more technical details on the changes we made to Parquet and Spark. This post will be of interest to a software developer. If a reader is familiar with Parquet and Spark with sufficient technical details, the reader may appreciate this content more, though... more
Part 2: Add Spark to a Big Data Application with Text Search Capability In this second part of the blog series, I propose a solution that addresses the problems highlighted in the first part. Please read the post to understand the motivations behind the proposed solution. Parquet file format For open data storage structure on HDFS,... more
Text search is an essential operation in many applications dealing with semi-structured big data. One such application, which many of us know about, deals with program logs, which not only contains data for troubleshooting, but also often, other information helpful for understanding how the application operates under normal conditions, for diagnosing performance characteristics, snapshot of internal... more
July 2nd 2015 was the 30th anniversary of one of my all time favorite movie “Back to the future.” In order to see the greatest possibilities offered by Big Data, Data Scientists will need to go back to the future using machine learning to drive data insights. In the future, what is new is the... more