Last week, Dmitry Sotnikov, COO of Jelastic had the opportunity to chat with Michael "Monty" Widenius, the author of the original version of the open-source MySQL database, and currently working on the community-developed branch of the MySQL database, MariaDB.
As you know, Big Data is so large that it's difficult to process using traditional databases and software techniques. Of course, the relational model and SQL dominate today's database landscape. But on the other side there are databases built without relations, made for higher scalability. So, we asked the expert in the database area, Monty, about the current and future state of SQL, NoSQL and Big Data. And his answers were somewhat surprising:
Would you please tell us a little about the history of NoSQL and Big Data? What are the main reasons that this has become such a topic of interest?
The whole thing with the "new NoSQL movement" started with a blog post from a Twitter employee that said MySQL was not good enough and they needed "something better", like Cassandra. The main reason Twitter had problems with MySQL back then, was that they were using it incorrectly. The strange thing was that the solution they suggested for solving their problems could be done just as easily in MySQL as in Cassandra. I can't find the original article, but I did find a follow up a bit later where it was said MySQL would be dropped for Cassandra. The current state is that now, 3 years later; Twitter is still using MySQL as their main storage for tweets. Cassandra was, in the end, not able to replace MySQL.
The main reason NoSQL became popular is that, in contrast to SQL, you can start using it without having to design anything. This makes it easier to start with NoSQL, but you pay for this later when you find that you don't have control of your data (if you are not very careful).
So, the main benefits (at least before MariaDB) of most NoSQL solutions are:
• Fast access to data (as long as you can keep everything in memory)
• Fast replication / data spread over many nodes.
• Flexible schema (you can add new columns instantly).
What problems can be solved (or do people think they can solve) with the help of Big Data?
More performance and more flexible schemas are the two biggest drivers of NoSQL.
What do you personally think about the future of Big Data? Your predictions?
I think that most of the people who are looking for NoSQL are doing it mostly because it's still 'hype'. Most companies don't have massive amounts of data, like Facebook and Google, and they will not be able to afford to have experts to tune and constantly develop the database. SQL is not going away. NoSQL can't replace it. Almost everyone will need relations (i.e., joins) to utilize their data. Still, there are places where NoSQL makes sense. I think, in the future, you will see more combined SQL and NoSQL usage. This is why we are extending MariaDB to be able to access NoSQL databases like Cassandra and LevelDB.
Why do people still use NoSQL? What are the main reasons?
Because it’s easier to get started with a NoSQL database. You don't have to learn SQL and define your database schema before you start using it. A few are using it because they believe it can scale better than SQL.
Can SQL outperform NoSQL? What are some unique advantages that make SQL better than NoSQL?
As soon as data can't fit into memory, SQL generally outperforms NoSQL. The same goes for things that NoSQL can't do. Most NoSQL solutions are optimized for single key access. For anything else, you have to write a program and it's very hard to beat a SQL optimizer for complex things, especially things that are automatically generated based on user requests (required for most web sites). SQL can also beat NoSQL on most single machines. In a cluster, where everything is in memory, NoSQL usually outperforms SQL for key lookups.
The problem with Hadoop is that there is no known business model around it that ensures that the investors will get back 10x money that they expect. Because of that, I have a hard time understanding how Cloudera can survive in the long run. It's not enough to have a good product. You also have to be able to make money with it.
Who are the primary proponents of Big Data and NoSQL?
All the NoSQL vendors of course. ;)
If this is all just hype, why are they talking about it?
It's not just hype for everyone. There are many big companies and projects that can benefit from Big Data. However, my point is that most don't need and should not use NoSQL, because it will become more expensive in the long run when you finally discover that NoSQL can't solve all your business needs.
Finally, how does MariaDB fit into all of this?
One of the goals of MariaDB is to be a bridge between NoSQL and SQL. That's why we have added support first for Cassandra and are now working on adding support for LevelDB. We also recognize some of the needs that NoSQL is trying to solve, which is why we added dynamic columns (which makes your SQL schemas as flexible as most NoSQL schemas) and much faster replication. We are working in MariaDB 10.0 to make the replication even faster, more fault tolerant and flexible. We are also working closely with Galera to provide a multi-master solution of MariaDB. All of this is to better adapt to a changing world and satisfy the needs people have -- or think they have. ;)
Please tell us about the new MariaDB foundation! What does this mean for developers worldwide?
The MariaDB foundation was created to ensure that it's not anymore just one person or one company that is driving MariaDB/MySQL development. It's only by having a set of independent companies working together with the common goal of keeping MariaDB as an actively developed open source project that MariaDB and the MySQL ecosystem will truly be free and future proof. What the MariaDB foundation is doing in practice is ensuring that the MariaDB project is actively developed as an open source project. The foundation is hiring developers to do all the builds, QA, merges, reviews of patches, etc, that is needed for a project to go forward.
Thanks so much, Monty! MariaDB continues to be a very popular choice among developers on the Jelastic platform. All the best!
Jelastic, Inc., based in Palo Alto, Calif., offers a Java and PHP cloud Platform-as-a-Service (PaaS) for developers and hosting service providers. Jelastic is the only PaaS designed specifically for hosting service providers to deploy and make available to their customers, enabling them to compete with the Amazons and Herokus of the world. Jelastic uniquely offers true automatic resource scaling for Java and PHP applications, thus delivering true next generation Java and PHP cloud computing. You can learn more about Jelastic or sign-up for the service for free at Jelastic.com.