While HDFS reads each data block from a single drive and therefore inherits the same speed limit. QFS reads every block from six drives in parallel, making its top theoretical read speed 300 MB/s. This translates into a significant speed boost for realworld jobs. Another key innovation that QFS includes is Reed-Solomon (RS) error correction. Unreachable machines and dead disk drives are the rule rather than the exception on a large cluster. Therefore, tolerating missing data is critical. HDFS uses triple replication, which expands data storage requirements 3x. QFS uses Reed-Solomon encoding, a commonly used error correction technique for CDs and DVDs, which offers superior data recovery power and yet only requires a 50% data expansion. Thus, QFS requires only half the storage of HDFS for equivalent capacity.
Quantcast also has conducted a 20 TB benchmark tests on both Hadoop and its QFS for both write only and and read job. The write job ran 75% faster using QFS due to having less data to write. The read job ran 47% faster, primarily because better parallelism shortened the delays caused by straggling workers. The company directly measures more than 100 million web destinations, collects well in excess of 500 billion new data records per month and, using QFS as its primary data store, exceeds 20 petabytes of daily processing.
The company has operated with QFS as its primary production file system for over one year, during which time it has handled more than 4 exabytes of IO. QFS’s significant performance improvements are achieved while simultaneously reducing disk storage requirements by 50% as compared to HDFS, with commensurate savings in capital and operating costs.
“In our Big Data future, file systems such as QFS will underpin cost-effective critical infrastructure for commerce and government. Just as performance and cost efficiency are key attributes of a file system, so are integrity and reliability, and we believe that the open source community is the most effective and sustainable path to dependable, enduring file system software,” said Konrad Feldman, CEO at Quantcast. “Quantcast makes use of open source software and by making our own contribution with QFS we’re hopeful that others will benefit as we have, and that community collaboration will enable QFS to meet the production demands of big data environments for years to come.”