Data Modelling – Rise and Fall of NoSQL

The Rise and Fall of the NoSQL & the Data Lake

Around 2010, there was the rise of the idea of “no need for a database” or “no need for data modelling” or “no need for transformations”. This gave rise to the Data Lake.

The database haters decided to just load the data they were given “as is” and write their own code to find the data they wanted and transform it as they needed it (See “Schema-on-Read”). Whilst this was a good idea for some data sets, it took hold of too many peoples mindsets and systems were developed on NoSQL database that shouldn’t have.

Data was often stored in JSON objects with no data checks, no transformations and more importantly no homogeneity between different data.
Data couldn’t easily be searched. Different data sources stored data with similar attribute names but ultimately different names.
Data couldn’t easily be joined. The devloper has to write the joins themselves.
Data couldnt easily be updated. Even performing a simple update to a record required “pulling” a large record from a store, updating it and “pushing” it back.

Searches, joins had to be written by developers. The more data that was loaded, the worse the problem became until it collapsed under their own weight. The Data Lake was now a Data Swamp.

The database haters hadn’t realised at least 3 important details
1. structuring data makes accessing data efficient and easy.
2. a database isn’t a simple storage device; its a complex query and processing engine. Optimising a query covering just a few entities is complex undertaking. The big database sellers have many expensive people working on optimising queries for a reason.
3. Databases had sophisticated transactions and consistency