Cloud Journal

 

 



Quantcast QFS Enables Faster, More Efficient Hadoop Processing


Written by  Sudheer Raju | 28 September 2012
E-mail PDF

quantcastQuantcast a 2006 big data startup that invested in efficiency innovations for Hadoop has now released Quantcast File System (QFS) to open source. Evolved from the Kosmos Distributed File System (KFS, also known as CloudStore), QFS offers a higher performance alternative to the Hadoop Data File System (HDFS) for batch data processing, significantly improving data I/O speeds and halving the disk space required to reliably store massive data sets. Fully integrable with Apache Hadoop, QFS has been live at Quantcast for four years, reliably handling petabyte-scale production workloads.

While HDFS reads each data block from a single drive and therefore inherits the same speed limit. QFS reads every block from six drives in parallel, making its top theoretical read speed 300 MB/s. This translates into a significant speed boost for realworld jobs. Another key innovation that QFS includes is Reed-Solomon (RS) error correction. Unreachable machines and dead disk drives are the rule rather than the exception on a large cluster. Therefore, tolerating missing data is critical. HDFS uses triple replication, which expands data storage requirements 3x. QFS uses Reed-Solomon encoding, a commonly used error correction technique for CDs and DVDs, which offers superior data recovery power and yet only requires a 50% data expansion. Thus, QFS requires only half the storage of HDFS for equivalent capacity.

Quantcast also has conducted a 20 TB benchmark tests on both Hadoop and its QFS for both write only and and read job. The write job ran 75% faster using QFS due to having less data to write. The read job ran 47% faster, primarily because better parallelism shortened the delays caused by straggling workers. The company directly measures more than 100 million web destinations, collects well in excess of 500 billion new data records per month and, using QFS as its primary data store, exceeds 20 petabytes of daily processing. 

quantcast
The company has operated with QFS as its primary production file system for over one year, during which time it has handled more than 4 exabytes of IO. QFS’s significant performance improvements are achieved while simultaneously reducing disk storage requirements by 50% as compared to HDFS, with commensurate savings in capital and operating costs.

“In our Big Data future, file systems such as QFS will underpin cost-effective critical infrastructure for commerce and government. Just as performance and cost efficiency are key attributes of a file system, so are integrity and reliability, and we believe that the open source community is the most effective and sustainable path to dependable, enduring file system software,” said Konrad Feldman, CEO at Quantcast. “Quantcast makes use of open source software and by making our own contribution with QFS we’re hopeful that others will benefit as we have, and that community collaboration will enable QFS to meet the production demands of big data environments for years to come.”

Sudheer Raju

Sudheer Raju

Founder of ToolsJournal, a technology journal on software tools and services. Sudheer has overall accountability for the webiste product development and is responsible for Sales and Marketing. With a flair to write, Sudheer himself writes for toolsjournal across all journal categories.


blog comments powered by Disqus