The two properties, that make NoSQL database systems more powerful than relational databases are:
1. The key of the object (data record) is not a part of the object itelf. This identity key is not just per table/type but system wide and immutable.
2. They are multivalued. I.e an attribute of a type may be a complex type. In particular an attribute may contain a collection (set, array, list, map) of other objects or of references to other objects (via the global key).
Simple enough, and at first sight not very exciting. If you come from an SQL background you probably wonder, what the hype is about. After all you have working databases without these properties.
But these simple capabilities allow for a completely different design of the database systems, and a clearer modelling of the domain applications that use them.
They are in contrast to SQL where
- The identity and therefore the address of a row is determined by its content,
- the scope of a primary key is only per table,
- a cell (column in a row) must be atomic.
What are the consequences of these two simple properties?
Property (1) allows for distribution. The key can be managed by the database system which can use this key to identify the node, where an object is stored. It also allows to cluster the objects by their closeness instead of clustering them by type (table). Having a system wide key allows for painless representation of inheritance.
Property (2) Allows to store really complex objects (documents) more easily. It also does away with many of the problems associated with normalisation and denormalisation.
The combination of both makes the join obsolete, and massively reduces the need for indices. It allows to store a collection of references to other objects directly in the object. (Using this collection enables the database to find the referenced objects, and it can retrieve them directly without the need for a join from the right node in the storage network). It also allows to represent graphs directly.
Let us look at a typical business example:
Orders, that contain items consisting of a count and a product, and have customers. Products have may have several suppliers and a supplier supplies several products.
In the relational world this is six tables. (Order, Order_Item, Product, Supplier and Product_To_Supplier_Link and Customer)
The SQL Query is problably longer than the actual data returned because you need 5 joins to retrieve the data.
You get a stupid flat list, that repeats Order, Item, Customer and Product Information for every supplier that can supply a Product in an order.
Scaling this is hard, because all six tables must be joined.
If you use a NoSQL database you need only four tables. Order, Product, Supplier, Customer.
The order_items that belong to the order can be directly embedded, because they are completely dependent on the order.
The product to supplier link table is not required, because the Product object can embed a collection of references to the suppliers for this product, and the supplier can embed a collection of the products it supplies. No indices are required, because the references can be looked up directly.
I think, besides better scalability and performance the NoSQL variant is easier to program and it models reality more closely.