This wiki page serves as a comprehensive guide outlining the components and functionalities of the Data Ocean Architecture.
Following the structure proposed in the "Reference Architecture," this document will delve into how the architecture supports a scalable, organized, and secure system for handling a variety of data needs across the organization.
This document is intended for multiple audiences within the organization, including but not limited to:
Data Engineers: For understanding the workflow and where they contribute to the architecture.
Data Scientists: To comprehend how to access and interact with the data for analytical purposes.
Business Intelligence Analysts: For knowing how the data flows and where they can extract the information they need for reports and dashboards.
Data Architects: Who are responsible for the overall structure and integrity of the data environment.
Technical Business Users: To gain insights into what data is available, how to access it, and under what conditions.
Data Governance Teams: For ensuring that the organization of data aligns with company policies and standards.
Data Organization serves as a crucial element of the Reference Architecture, designed to align data processes across various business units effectively.
The architecture's structure adheres to a Domain-Driven approach that aligns well with Solvay Atom's transformation goals and fosters data culture and accountability.
This chapter outlines the core components, the layered architecture, and the nine specific business Domains.
The Data Ocean architecture includes several layers to manage the complexity and demands of a data-driven organization:
This is the foundational layer where data in its native format is ingested into the architecture. No data transformations occur at this stage, which ensures that the data can serve as a point-in-time archive. The layer is hierarchically organized based on subject areas, data sources, and time of ingestion. Access to this layer is restricted to prevent unauthorized or incorrect usage.
This optional layer serves as an intermediary to enhance performance in transferring data from the Raw layer to the Curated layer. It stores data in an optimized format suitable for data cleansing and possible partitioning to a more granular level.
Here, data is transformed into consumable datasets, available either in files or tables. Before reaching this layer, the data undergoes a series of cleansing and transformation activities. It's also the most complex part of the architecture, as data is denormalized and different objects may be consolidated here.
These specialized layers apply additional business logic or machine learning models to the data. They source data from the Cleansed layer and are enforced with any needed business logic or security measures. Data models to address business analytical requirements are also created here.
This optional layer is for advanced analysts and data scientists to conduct experiments. Here, they can perform tests to find patterns, correlations, or to validate machine learning models.
The data organization within the architecture is designed around nine business domains:
The architecture is equipped with:
By understanding the layers and the domain-driven approach, you can appreciate how the Data Ocean Architecture enables data to be effectively managed, secured, and leveraged for organizational success.