Tag Archives: Oozie
Security is a work-in-progress for the Apache Hadoop project and sub-projects, as I discuss as part of an O’Reilly Hadoop tutorial, “Get started with Hadoop: from evaluation to your first production cluster”. Below are several of the security tips and best practices that I discuss in that article. (more…)
Many organizations will mix and match individual Apache projects and sub-projects using Apache Hadoop’s loosely coupled architecture. This Hadoop toolbox provides a powerful set of tools and capabilities, but it does have some important limitations that can require a platform approach to address.
The Hadoop Distributed File System (HDFS) combines storage and processing in each data node. With the HDFS file system, you can add new files or append to existing files, but not replace files without use of a new filename. The append capability works well for adding new time-stamped logs as they come in, but can complicate storage of structured files. (more…)