View Source

Introduction

Purpose of the Document

This wiki page serves as a comprehensive guide outlining the components and functionalities of the Data Ocean Architecture.

Following the structure proposed in the "Reference Architecture," this document will delve into how the architecture supports a scalable, organized, and secure system for handling a variety of data needs across the organization.

Target Audience

This document is intended for multiple audiences within the organization, including but not limited to:

Data Engineers: For understanding the workflow and where they contribute to the architecture.
Data Scientists: To comprehend how to access and interact with the data for analytical purposes.
Business Intelligence Analysts: For knowing how the data flows and where they can extract the information they need for reports and dashboards.
Data Architects: Who are responsible for the overall structure and integrity of the data environment.
Technical Business Users: To gain insights into what data is available, how to access it, and under what conditions.
Data Governance Teams: For ensuring that the organization of data aligns with company policies and standards.

2: Data Organization

Data Organization serves as a crucial element of the Reference Architecture, designed to align data processes across various business units effectively.

The architecture's structure adheres to a Domain-Driven approach that aligns well with Solvay Atom's transformation goals and fosters data culture and accountability.

This chapter outlines the core components, the layered architecture, and the nine specific business Domains.

Data Ocean > Components and Functionality > 2023-08-29 08_43_30-Data Ocean Architecture (Meeting) - Google Slides.png

2.1 Architectural Layers

The Data Ocean architecture includes several layers to manage the complexity and demands of a data-driven organization:

2.1.1 Raw Data Layer (Ingestion Layer)

This is the foundational layer where data in its native format is ingested into the architecture. No data transformations occur at this stage, which ensures that the data can serve as a point-in-time archive. The layer is hierarchically organized based on subject areas, data sources, and time of ingestion. Access to this layer is restricted to prevent unauthorized or incorrect usage.

2.1.2 Normalized Data Layer (Staging)

This optional layer serves as an intermediary to enhance performance in transferring data from the Raw layer to the Curated layer. It stores data in an optimized format suitable for data cleansing and possible partitioning to a more granular level.

2.1.3 Cleansed Data Layer (ODS / Curated)

Here, data is transformed into consumable datasets, available either in files or tables. Before reaching this layer, the data undergoes a series of cleansing and transformation activities. It's also the most complex part of the architecture, as data is denormalized and different objects may be consolidated here.

2.1.4 Use-Case Oriented Layers (Domain/Product)

These specialized layers apply additional business logic or machine learning models to the data. They source data from the Cleansed layer and are enforced with any needed business logic or security measures. Data models to address business analytical requirements are also created here.

2.1.5 Sandbox

This optional layer is for advanced analysts and data scientists to conduct experiments. Here, they can perform tests to find patterns, correlations, or to validate machine learning models.

2.2 Domain-Driven Architecture

The data organization within the architecture is designed around nine business domains:

2.2.1 Business Domains

HR (Human Resources): Focuses on employee data, covering aspects like recruitment, payroll, and performance metrics.
Procurement: Centralizes data related to vendor management, contracts, and procurement cycles.
Finance: Manages financial records, including budgets, income, expenses, and other fiscal reports.
Marketing & Sales: Addresses customer interaction data, sales metrics, and market analysis.
Supply Chain: Deals with logistical data concerning supply chain management, inventories, and distribution.
Structure & Shared Domain: Contains data shared across various business units and aspects related to the organizational structure.
Industrial: Houses data related to manufacturing, equipment health, and quality control.
R&I (Research & Innovation): Maintains data on R&D projects, patents, and scientific research.

2.2.2. Additional Domains

Technical Domain: This is where system metadata, context, and technical details are stored.
Common Domain: For data that is shared across all business units, such as common referential information.

2.2.3 Domain Responsibilities

Domains are responsible for creating and maintaining quality datasets.
Each domain must ensure their data meets specified standards such as being discoverable, addressable, and trustworthy.

2.2.4 Roles Within Domains

Data Product Owner: Responsible for consumer satisfaction, quality of the domain datasets, and overall data lifecycle management.
Data Team: Focused on platform enhancements, monitoring, automation, and alerting.

2.2.5 Value and Benefits

Centers data acquisition, processing, and serving with domain experts
Decreases common data pain points like data cleansing and orientation
Supports the emergence of a data product focus

2.2.6 Capabilities and Infrastructure

The architecture is equipped with:

Scalable, secure, and governed storage
Encryption standards
Metadata management
Data pipeline orchestration
Unified Access Control
Monitoring, alerting, and logging
Self-service capabilities

2.2.7 Governance and Team Structure

Aims to reduce duplication of effort across domains
Provides essential shared services and tools
Focuses on delivering value while adhering to security and governance protocols

By understanding the layers and the domain-driven approach, you can appreciate how the Data Ocean Architecture enables data to be effectively managed, secured, and leveraged for organizational success.