The first one and the key parameter to consider is how a chosen big data solution stores the data. Given that the volumes of data into an organization can come from various sources and the speed at which the data is growing is accelerating considerably, the solution should be capable of handling such volumes and sources with a massive factor of growth rate considered year on year and at a cheaper price.
The storage and speed of retrieval of data matters which mandates an innovative approach of capturing the data based on either a context or a classification. Most of the solutions today dump the data in a hadoop based server storage which no longer will suffice given the size of data. Putting a context around data storage also helps store necessary data than ending up having just 20% useful data as compared to 80% of waste.
Yes, solution is great at storing data but will be of no use if the system is not quick enough to retrieve and process data at high speeds. Big data analytics processing thrive on system performance, commodity infrastructure, and low cost usually. However, a solution that is intelligent enough to analyze its own resource constraints and dynamically shuffle resources would be key to leverage the benefits of such huge data available. A focus on Data availability and processing in memory will be ideal.
Some examples of what we are looking at processing within big data systems include Facebook handles over 55 billion photos from its user base, Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data — the equivalent of 167 times the information contained in all the books in the US Library of Congress.
Not all analytics will be correct and is entirely dependant on context. For instance a 1000 influential customers can reveal something that is exactly opposite to what 100,000 customers reveal. Big data analytics should present those insights of what is good for your business than just overwrite 1000 influential customers in this case. Select a solution that actually understands and produces analytics that help decision making than one that exposes colourful dashboards. Analytics are no good unless they help visualize clearly the shortcomings of your business at various entry and exit points of work-flow and enable you to take calculated decisions. Drill-down of data from the visual analytics along with Search with in big data is another key factor that should be a default feature within selected solution.
Personally would love a product if it exists which can take my business questions and answer real time based on the analytics from any device of choice. Big data solutions are being customized for specific domains like finance, retail, health-care and more. One that knows a domain would surely is a plus.
Indeed analytics requires specialist skills like statistical modelling but industry is not geared up to have such expertise on a wider scale, neither the enterprises will have time to address such complex modelling. Some of the solutions are encapsulating such skills within their solutions and exposing a GUI for users to define their dashboards. Such an encapsulation is good and bad based on how you see it and is a personal choice.
No big data software can achieve an end to end data processing, storage, analytics, customer interaction, process optimization and more as a single solution. This mandates the chosen solution to have powerful and variety of integrations capability from supporting NoSQL databases, variety of storage options, analytics solutions, social/customer management solutions and most importantly integrate with various development platforms making developers life easier to use the big data software.
Its definitely great time to get involved and shape up your business small or big to this not new, but emerging trend and shape up your solution landscape to be able to handle volumes of data and gather decision making analytics which you would otherwise let it go waste.