上QQ阅读APP看书，第一时间看更新

Erasure encoding in Hadoop 3.x

HDFS achieves fault-tolerance by replicating each block three times by default. However, in big clusters, the replication factor can be more. The purpose of replication is to handle data loss against machine failure, providing data locality for the MapReduce job, and so on. The replication takes more storage space, which means that if our replication factor is three, HDFS will take an extra 200% space to store file. In short, storing 1 GB of data will require 3 GB of memory. This also causes metadata memory on NameNode.

HDFS introduced erasure coding (EC) for storing data by taking less storage space. Now, data is labelled based on their usage access pattern, and after the conditions for erasure coding have been satisfied, data will be applicable for erasure coding. The term Data Temperature is used to identify the data usage pattern. The different types of data are as follows:

Hot data: By default, all data is considered HOT. Data that is accessed more than 20 times a day and whose age is less than seven days is considered HOT data. Data in this storage layer will have all of its replicas in the disk tier and will still utilize 200% extra storage if the replication factor is three.
Warm data: Data whose access frequency is only a few times over the course of the week comes under the warm data layer. Warm data will have one replica available in the disk tier and the rest will go into the archive tier.
Cold data: Data that is accessed only a few times a month and has an age more than a month goes into the COLD layer. This data can be used for erasure coding.

As we have already discussed, initially, all blocks will be replicated as configured per the replication factor. When the erasure code condition is met, then the blocks will be changed into an erasure coding form. The following flowchart shows erasure encoding in Hadoop 3.x:

Erasure encoding is the process of encoding a message with additional parity data so that even if a portion of data is lost, then it can be recoverable using an encoded value. An architecture improvement has been done for HDFS so that it can support erasure coding. The following components were added as extensions:

ECManager: The ECManager is added as a Namenode extension and resides on Namenode. It manages the EC block group and does group allocation, placement of blocks, health monitoring, and coordinates for the recovery of blocks. Erasure coding stripes HDFS file data and these stripes may contain a large number of internal blocks. NameNode may require more space to store metadata for these blocks. ECManager reduces the space consumption on the NameNode by managing these internal blocks efficiently.
ECClient: ECClient is an extention to HDFS client that stripes data to and from multiple DataNodes in parallel. The block group consists of multiple internal blocks and ECClient helps the client perform read and write operations on multiple internal blocks of the block group.

ECWorker: ECWorkers are available on Datanodes and can be used to recover a failed erasure coded block. ECManager tracks the failed erasure coded block and gives instructions to the ECWorker to recover these blocks. DataNodes doesn't know about EC or striping during normal I/O operations. ECWorkers listen for any instructions from the ECManager. The ECWorker then pulls data from peer DataNodes, does codec calculation, builds converted blocks, and then pushes this to additional ECWorkers.