Available for free download under the Apache 2.0 license, Serengeti is a “one-click” deployment toolkit that allows enterprises to leverage the VMware vSphere® platform to deploy a highly available Apache Hadoop cluster in minutes, including common Hadoop components like Apache Pig and Apache Hive. By using Serengeti to run Hadoop on VMware vSphere, enterprises can easily leverage the high-availability, fault tolerance and live migration capabilities of the world’s most trusted, widely deployed virtualization platform to enable the availability and manageability of Hadoop clusters.
In addition, VMware said that it is working with the Apache Hadoop community including Cloudera, Greenplum, Hortonworks, IBM and MapR to contribute extensions that will make key components “virtualization-aware” to support elastic scaling and further improve Hadoop performance in virtual environments.
The organization is also said to be contributing changes to the Hadoop Distributed File System (HDFS) and Hadoop MapReduce projects to make them “virtualization-aware,” so that data and compute jobs can be optimally distributed across a virtual infrastructure. These changes will enable enterprises to achieve a more elastic, secure and high available Hadoop cluster. The extensions can be found here.
"Spring for Apache Hadoop" is also on the road, an open source project first launched in February of 2012 to make it easy for enterprise developers to build distributed processing solutions with Apache Hadoop. These updates allow Spring developers to easily build enterprise applications that integrate with the HBase database, the Cascading library, and Hadoop security. Spring for Apache Hadoop is free to downloadand available now under the open source Apache 2.0 license.