Data Modelling

    Data Modelling – Rise and Fall of NoSQL

    Data Modelling : Basics

    Data Modelling is the foundation of any data system whether an EDM or Data Platform. It main role is to homogenise all received data to allow it to be used easily and effectively. It creates structure to data and defines relationships between data and optimises the data for management and accessibility.

    Data Modelling is a compromise between many factors – redundancy, performance and management. There is no one perfect model – just one least worse model which balances all the needs of an organisation.

    Data Modelling can have 3 stages :
    Conceptual – a high-level view of the core business entities, attributes and relationships between them
    Logical – a platform independent view of the conceptual data model with more detail adding Primary Keys and Cardinality
    Physical Models – a detailed database model identifying tables, indexes, relationships that get created.

    All 3 models are most commonly displayed as an Entity Relationship Diagram and there are many tools out there to help.

    In reality, the differences between the each layer can be blurred.

    Even the Physical data model wont necessarily match the actual data model implemented.

    “Data Model-First”

    Over the years there has been the mantra “API-First”. This means that the design of the API is treated the top-priority before building the rest of the application. As a data architect, modeller and data engineer of many years, I believe this to be a very short-sighted approach. Designing your APIs without understanding the model of the input data, the way you want to store it at the same time is liable to paint yourself into a corner.

    It is better to consider the input, storage and output models (ie APIs) holistically. You need to ingest and transform data quickly, and output the data from your storage layer quickly. Selecting an output model with no consideration for how the storage layer is modelled is liable to lead to a poor performance and difficult to manage data.

    Data Models Types

    Relational

    Star Schema

    Snowflake Schema

    Data Vault

    Too many data engineering see the choice in binary choice, selecting one type and crow-barring everything into that schema. More experienced engineering will mix these types into one consistent model.

    Temporal Modelling

    Temporal data models allows the storing of data as it changes through time, either slow moving changes or fast moving changes. Your address would be an example of a slow moving change – only changing every few years. Fuel price would be a fast moving change.

    You can choose to
    – store the latest or current value – no temporality
    – store the dates the value applies to – uni-temporal
    – store the dates the value applies to and corrections to the values – bi-temporal