Mastering Hadoop 3
上QQ阅读APP看书,第一时间看更新

On-premise distribution

There are many distributions available in the market; we will look at the most widely used distributions:

  • Cloudera: Cloudera is an open source Hadoop distribution that was founded in 2008, just when Hadoop started gaining popularity. Cloudera is the oldest distribution available. People at Cloudera are committed to contributing to the open source community and they have contributed to the building of Hive, Impala, Hadoop, Pig, and other popular open-source projects. Cloudera comes with good tools packaged together to provide a good Hadoop experience. They also provide a nice GUI interface to manage and monitor clusters, known as Cloudera manager.
  • Hortonworks: Hortonworks was founded in 2011 and it comes with the Hortonworks Data Platform (HDP), which is an open-source Hadoop distribution. Hortonworks Distribution is widely used in organizations and it provides an Apache Ambari GUI-based interface to manage and monitor clusters. Hortonworks contributes to many open-source projects such as Apache tez, Hadoop, YARN, and Hive. Hortonworks has recently launched a Hortonworks Data Flow (HDF) platform for the purpose of data ingestion and storage. Hortonworks distribution also focuses on the security aspect of Hadoop and has integrated Ranger, Kerberos, and SSL-like security with the HDP and HDF platforms.
  • MapR: MapR was founded in 2009 and it has its own filesystem called MapR-FS, which is quite similar to HDFS but with some new features built by MapR. It boasts higher performance; it also consists of a few nice sets of tools to manage and administer a cluster, and it does not suffer from a single point of failure. It offers some useful features, such as mirroring and snapshots.