What is SequenceFile?

This is for Hadoop eco system like HDFS, Map reduce, Hive, Hbase, Pig, sqoop,sqoop2, Avro, solr, hcatalog, impala, Oozie, Zoo Keeper and Hadoop distribution like Cloudera, Hortonwork etc.
Posts: 162
Joined: Sat Sep 20, 2014 11:29 pm

What is SequenceFile?

Postby mohit123 » Tue Sep 23, 2014 7:39 pm

What is SequenceFile?


Re: What is SequenceFile?

Postby Guest » Tue Sep 23, 2014 7:42 pm

SequenceFile is a flat file consisting of binary key/value pairs. It is extensively used in MapReduce as input/output formats. It is also worth noting that, internally, the temporary outputs of maps are stored using SequenceFile.

The SequenceFile provides a Writer, Reader and Sorter classes for writing, reading and sorting respectively.

There are 3 different SequenceFile formats:

1) Uncompressed key/value records.
2) Record compressed key/value records - only 'values' are compressed here.
3) Block compressed key/value records - both keys and values are collected in 'blocks' separately and compressed. The size of the 'block' is configurable.

The recommended way is to use the SequenceFile.createWriter methods to construct the 'preferred' writer implementation.

The SequenceFile.Reader acts as a bridge and can read any of the above SequenceFile formats.


Return to “Hadoop and Big Data”

Who is online

Users browsing this forum: No registered users and 3 guests