NoSQL, NewSQL, evolution of databases
Main Web sites are using a NoSQL database. That started with Google and Facebook.
The scalability required and the large amount of data and updates make the relational model inefficient, forcing to find a new model.
The word NoSQL appeared in 2009 to describe the growing number of software does not use the classical relational model. The name is short for "Not Only SQL", to express the fact that we want to go beyond the traditional data access model, as explained below.
We can consider that BigTable created by Google for the index of its engine is the origin of the NoSQL movement, since the model was taken up by the IT department of major web sites.
Why NoSQL?
The classic model is ineffective against certain types of processing:
- Indexing a quantity of documents.
- Sites with heavy traffic.
- Variable data size depending on records.
- Frequent rewrites (the classical model expects more reads than writes).
- Extensibility of the database.
- When speed is important.
- Increased productivity, it is simpler.
Some companies are not satisfied with their experience with NoSQL and return to MySQL or MariaDB. This is the case of Arstechnica, Google with the Spanner project. This comes from progress in performance of these BD. But other prefer NoSQL.
SQL vs. NoSQL
NoSQL is oriented columns, we are implying that we can add columns to each record so easily we can add rows (INSERT/UPDATE) in the relational model.
NoSQL has no conceptual schema and can change over time in number of columns as rows.
But what make NoSQL be much faster? This is mainly due to the way we treat the variable columns.
Consider an example. Suppose that we manage the traffic of an airport with a list of all routes.
Flight number | Aircraft | Pilot | Route |
---|---|---|---|
001 | Airbus | Joe | NY-Delhi |
But we can not have a column for the name of each passenger. And the number of passengers is very variable on aircrafts and flights, the table will contain a quantity of holes and searching data would slow accordingly.
That is how the classical relational model deals with the problem. We creates a table "passengers" in the following form.
Flight number | Passenger name |
---|---|
001 | x |
001 | y |
001 | z |
The same table will contain all flights and all the names of passengers. It goes without saying that accessing the data requires processing a lot of information before reaching the desired information.
A NoSQL row table will look like this:
Flight number | Aircraft | Pilot | Route | |||
---|---|---|---|---|---|---|
001 | Airbus | Joe | NY-Delhi | x | y | z |
The number of columns for each flight depends on the number of passengers.
Finding a passenger on a flight will obviously be much faster in such a model but more importantly, modifying the data will be infinitely easier.
NewSQL, another approach
It is not a format but a new approach in the implementation. The original name was "ScalableSQL" and its purpose is the high-performance management of data.
NoSQL is blamed to sacrifice the ACID rules (Atomicity, Consistency, Isolation, Durability), thus providing less security in data access.
A NewSQL database maintains the classical structure in columns but uses various methods to keep the speed even on large volumes.
VoltDB is a new DB manager based on NewSQL: it is designed to run entirely in memory which gives it unparalleled speed and obsoletes Oracle.
Graph database
Designed specifically to store and retrieve relationships between objects/persons, they are more efficient than classical relational database (despite the name), to make queries about these links. They have no table.
The structure of the base is composed of nodes and properties, which are similar to the objects of OOL, and edges, data representing a link between two objects, with a value that represents the weight of the link. The number of link varies and the base evolves on two levels, objects added or removed, and linkages.
Neo4J is perhaps the best tool available to build a such database, even if they are other proprietaty software around such as Pregel of Google.
NoSQL software
- ArangoDB. Use both the Document, Graphe and Key/Value models.
- Percolator from Google. Succeeded to BigTable for the index of the engine.
- Cassandra from Facebook. Derivative from BigTable.
- MongoDB. Document-oriented database. It is used for example by Sourceforge.
- ElasticSearch. Even if it is presented as a system for storage of documents and search in these documents, ES is actually a DBMS and is quite similar to MongoDB. It is easy to deploy and use and it manages to power huge sites, especially in eCommerce.
- Hadoop. A framework that implements the Google's MapReduce, for distributed processing of large amounts of data. The project includes HBase, a database manager using the framework and modeled after BigTable.
- Aerospike. Based on the key/value model like Redis too, claims to be one hundred times faster that any DB like MySQL. Uses the Aerospike Query Language. (Apache license).
Documents
- Oracle threatened by database minnows. Migrating from Oracle to NoSQL or NewSQL may be the best mean to scale.