Modern Big Data Processing with Hadoop
上QQ阅读APP看书,第一时间看更新

Sequence file

A sequence file is a flat file consisting of binary key/value pairs. They are extensively used in MapReduce (https://wiki.apache.org/hadoop/MapReduce) as input/output formats. They are mostly used for intermediate data storage within a sequence of MapReduce jobs. Sequence files work well as containers for small files. If there are too many small files in HDFS, they can be packed in a sequence file to make file processing efficient. There are three formats of sequence files: uncompressed, record compressed, and block compressed key/value records. Sequence files support block-level compression but do not support schema evolution.