The emergence of Hadoop as preferred solution for Big Data nalytics across unstructured data has warranted significant contributions to take the Hadoop innovation further quick and fast. Hadoop-based batch processing of unstructured and structured data at massive scale using commodity hardware has led to a profound change in analytics. By extracting the knowledge wrapped within unstructured and machine-generated data, organizations can make better decisions that drive revenue, improve service and reduce costs.
With the availability of a large-scale test bed, developers can have their contributions validated at scale, and enterprises can confidently deploy new releases in a production environment. The test bed cluster, which consists of 1,000-plus hardware nodes or 10,000 nodes with the addition of virtual machines, features 24 petabytes of physical storage. This is the equivalent of nearly half of the entire written works of mankind, from the beginning of recorded history.
Hadoop innovation and development is reliant upon contributions made by open source developers. However, the Apache Hadoop community has consistently faced the challenge of provisioning the required resources to validate new releases of the open source software. Without access to a large cluster for scale validation, the Apache community – and enterprise users – must wait for Hadoop user communities to sponsor an effort to run scale validations. This is done very infrequently and a lot of time is spent stabilizing releases for enterprise adoption.
With an aggressive plan for testing on the Apache Hadoop trunk and its continuing releases, EMC is excited to contribute to the Hadoop open source community by providing testing resources it lacks to quickly identify bugs, stabilize new releases and optimize hardware configurations in an effort to speed up the innovation of Hadoop. EMC plans to provide test results to the Apache Software Foundation and open source community, and EMC's testing will be planned in coordination with the Apache Hadoop project.
The Greenplum Analytics Workbench is the result of a collaboration of several hardware and software vendors including EMC, Intel, Mellanox Technologies, Micron, Seagate, SuperMicro, Switch, VMware.




Since the time EMC has got into Big Data, it has been contributing significantly to the big data world. Recently it has