It is worth noting that, as explained by Google in the original paper, Dremel complements MapReduce-based computing. Dremel is not intended as a replacement for MapReduce and is often used in conjunction with it to analyze outputs of MapReduce pipelines or rapidly prototype larger computations.
Drill will integrate closely with Apache Hadoop with the data living in Hadoop. That is, Drill will support Hadoop FileSystem implementations and HBase. Hadoop-related data formats will be supported (eg, Apache Avro, RCFile) and MapReduce-based tools will be provided to produce column-based formats. Drill tables can be registered in HCatalog. Finally, Hive is being considered as the basis of the DrQL implementation. Check out these slides for more info.
Project Drill architecture constitutes of four key components:
- Query languages: This layer is responsible for parsing the user's query and constructing an execution plan. With initial support to SQL-like language used by Dremel and Google BigQuery, it will scale to other languages and programming models, such as the Mongo Query Language, Cascading or Plume.
- Execution engine: This layer is responsible for executing the physical plan. It provides the scalability and fault tolerance needed to efficiently query petabytes of data on 10,000 servers.
- Nested data formats: This layer is responsible for supporting various data formats. The initial goal is to support the column-based format used by Dremel. Drill is designed to support schema-based formats such as Protocol Buffers/Dremel, Avro/AVRO-806/Trevni and CSV, and schema-less formats such as JSON, BSON or YAML.
Championed by none other than Hadoop fame Ted Dunning, the initial committers are employees of MapR Technologies, Drawn to Scale and Concurrent Inc. MapR, a hadoop distributor is the leading player in the inception of Drill which offers a commercial version of its own hadoop. “We’ve spent quite a few months talking to lots of organisations and potential users of Drill and to our customer base as well,” said Shiran, who is a founding member of the Drill project. “We wanted to put this out there as an open-source project, rather than just keep it within MapR for our use alone.”