This is for Hadoop eco system like HDFS, Map reduce, Hive, Hbase, Pig, sqoop,sqoop2, Avro, solr, hcatalog, impala, Oozie, Zoo Keeper and Hadoop distribution like Cloudera, Hortonwork etc.
2 posts • Page 1 of 1
SequenceFile is a flat file consisting of binary key/value pairs. It is extensively used in MapReduce as input/output formats. It is also worth noting that, internally, the temporary outputs of maps are stored using SequenceFile.
The SequenceFile provides a Writer, Reader and Sorter classes for writing, reading and sorting respectively.
There are 3 different SequenceFile formats:
1) Uncompressed key/value records.
2) Record compressed key/value records - only 'values' are compressed here.
3) Block compressed key/value records - both keys and values are collected in 'blocks' separately and compressed. The size of the 'block' is configurable.
The recommended way is to use the SequenceFile.createWriter methods to construct the 'preferred' writer implementation.
The SequenceFile.Reader acts as a bridge and can read any of the above SequenceFile formats.
Users browsing this forum: No registered users and 1 guest