Different industries and different use-cases will require different data platforms designs and architectures.
The Data Lake Platform
The focus of a Data Lake based Data Platform is the storage of data “as is”. There maybe no need to transform to the data to an internal data model or the data model maybe done as “schema on read”. Allowing different users and systems to pick the data they need from the “raw” data.
NinjaDraw diagram is not selected.
Components Required : Storage, Orchestration, Ingestion
Optional : Data Quality, Data Lineage
The Distribution Platform
The focus of Distribution based Data Platform is to act a centralised Data Distribution for an organisation. Organisations where data is shared between multiple departments or teams would benefit from having a single storage unit acting like a “data bus” rather than use point to point data transfers.
An Ingestion layer is optional as it could be left up to the individual feeder systems to load the data themselves. Each may have a preference for their ingestion methods. However, as the Data Platform grows in use, the nature tendency is to build one loading mechanism used by all.
Components Required : Storage, Dissemination
Optional : Ingestion, Data Quality, Data Lineage
The “Two Pillars” Platform
The “Two Pillars” Platform is a combination of a Data Lake Platform and a Distribution Platform. The transformation layer could be left out as each data consumer could perform its own transformation to its own data model.
Components Required : Storage, Orchestration, Ingestion, Dissemination
Optional : Ingestion, Data Quality, Data Lineage
The “Full Monty” Platform
The full Data Platform fits for organisations pulling in 3rd party data, re-modelling it and disseminating the data across teams and storing internally produced data
Components Required : Storage, Orchestration, Ingestion, Transformation, Dissemination
Optional : Data Quality, Data Lineage
