<< Rationale for a common query language for "NoSql Databases" | Home | Simplified JSon = Ison [EyeSon] >>

A short explanation of the NoSQL data model.

The two properties, that make NoSQL database systems more powerful than  relational databases are:

1. The key of the object (data record) is not a part of the object itelf. This identity key is not just per table/type but system wide and immutable.

2. They are multivalued. I.e an attribute of a type may be a complex type. In particular an attribute may contain a collection (set, array, list, map) of other objects or of references to other objects (via the global key).

Simple enough, and at first sight not very exciting. If you come from an SQL background you probably wonder, what the hype is about. After all you have working databases without these properties.

But these simple capabilities allow for a completely different design of the database systems, and a clearer modelling of the domain applications that use them.

They are in contrast to SQL where
- The identity and therefore the address of a row is determined by its content,
- the scope of a primary key is only per table,
- a cell (column in a row) must be atomic.

What are the consequences of these two simple properties?

Property (1) allows for distribution. The key can be managed by the database system which can use this key to identify the node, where an object is stored. It also allows to cluster the objects by their closeness instead of clustering them by type (table). Having a system wide key allows for painless representation of inheritance.

Property (2) Allows to store really complex objects (documents) more easily. It also does away with many of the problems associated with normalisation and denormalisation.

The combination of both makes the join obsolete, and massively reduces the need for indices. It allows to store a collection of references to other objects directly in the object. (Using this collection enables the database to find the referenced objects,  and it can retrieve them directly without the need for a join from the right node in the storage network). It also allows to represent graphs directly.

Let us look at a typical business example:

Orders, that contain items consisting of a count and a product, and have customers. Products have may have several suppliers and a supplier supplies several products.

In the relational world this is six tables. (Order, Order_Item, Product, Supplier and Product_To_Supplier_Link and Customer)
The SQL Query is problably longer than the actual data returned because you need 5 joins to retrieve the data.
You get a stupid flat list, that repeats Order, Item, Customer and Product  Information for every supplier that can supply a Product in an order.
Scaling this is hard, because all six tables must be joined.

If you use a NoSQL database you need only four tables. Order, Product, Supplier, Customer.

The order_items that belong to the order can be directly embedded, because they are completely dependent on the order.

The product to supplier link table is not required, because the Product object can embed a collection of references to the suppliers for this product, and the supplier can embed a collection of the products it supplies. No indices are required, because the references can be looked up directly.

I think, besides better scalability and performance the NoSQL variant is easier to program and it models reality more closely.


 



Re: A short explanation of the NoSQL data model.

What you have described is called a "network model" database, and it was the hot new thing 40 years ago, shortly before "relational model" databases became all the rage. In reality, relational databases have become so popular over the past 30+ years because they are easier to program. That is, joins made network databases obsolete, not the other way around.

You're absolutely right that it's very easy to program things like "what products has this customer ordered" in a NoSQL database. You simply look up the customer and walk through its table of orders, building a set of products. But how about "what suppliers made us the most profit last month", "what products are the most popular", and "what other customers ordered the same products as this customer"? While the last three are fairly simple to express in SQL, they require a skilled programmer to write in NoSQL.

While many of the world's largest OLTP systems are still using network model and even hierarchy model (the predecessor to network), it's not hard to see why relational DBs have become so popular.

Re: A short explanation of the NoSQL data model.

Gabe,

thanks for your insightful comment. The questions you pose deserve another full posting. The reason, they are hard to write in NoSQL is that there is no equivalent to SQL as a standardized query language. To do this kind of BI queries, a network data model needs the flatten operator. It is implemented in ReportsAnywhere and it works well.


Add a comment Send a TrackBack