Cloud Journal



1000 Node Analytic Platform For Hadoop Testing and Development

Written by  Sudheer Raju | 02 April 2012
E-mail PDF

greenplumSince the time EMC has got into Big Data, it has been contributing significantly to the big data world. Recently it has released GreenPlum Chorus source code under opensource license and further today announced the creation of the Greenplum® Analytics Workbench, which will be used for regular integration tests on Apache Hadoop. The 1,000-plus node test bed cluster incorporates technology from the world's leading software and hardware manufacturers with the intention of providing the infrastructure needed to facilitate Apache Hadoop innovation.

The emergence of Hadoop as preferred solution for Big Data nalytics across unstructured data has warranted significant contributions to take the Hadoop innovation further quick and fast. Hadoop-based batch processing of unstructured and structured data at massive scale using commodity hardware has led to a profound change in analytics. By extracting the knowledge wrapped within unstructured and machine-generated data, organizations can make better decisions that drive revenue, improve service and reduce costs.

With the availability of a large-scale test bed, developers can have their contributions validated at scale, and enterprises can confidently deploy new releases in a production environment. The test bed cluster, which consists of 1,000-plus hardware nodes or 10,000 nodes with the addition of virtual machines, features 24 petabytes of physical storage. This is the equivalent of nearly half of the entire written works of mankind, from the beginning of recorded history.

Hadoop innovation and development is reliant upon contributions made by open source developers. However, the Apache Hadoop community has consistently faced the challenge of provisioning the required resources to validate new releases of the open source software. Without access to a large cluster for scale validation, the Apache community – and enterprise users – must wait for Hadoop user communities to sponsor an effort to run scale validations. This is done very infrequently and a lot of time is spent stabilizing releases for enterprise adoption.

With an aggressive plan for testing on the Apache Hadoop trunk and its continuing releases, EMC is excited to contribute to the Hadoop open source community by providing testing resources it lacks to quickly identify bugs, stabilize new releases and optimize hardware configurations in an effort to speed up the innovation of Hadoop. EMC plans to provide test results to the Apache Software Foundation and open source community, and EMC's testing will be planned in coordination with the Apache Hadoop project.

The Greenplum Analytics Workbench is the result of a collaboration of several hardware and software vendors including EMC, Intel, Mellanox Technologies, Micron, Seagate, SuperMicro, Switch, VMware.

Sudheer Raju

Sudheer Raju

Founder of ToolsJournal, a technology journal on software tools and services. Sudheer has overall accountability for the webiste product development and is responsible for Sales and Marketing. With a flair to write, Sudheer himself writes for toolsjournal across all journal categories.

blog comments powered by Disqus