Introduction
Purpose of the Document
This wiki page serves as a comprehensive guide outlining the components and functionalities of the Data Ocean Architecture.
Following the structure proposed in the "Reference Architecture," this document will delve into how the architecture supports a scalable, organized, and secure system for handling a variety of data needs across the organization.
Target Audience
This document is intended for multiple audiences within the organization, including but not limited to:
Data Engineers: For understanding the workflow and where they contribute to the architecture.
Data Scientists: To comprehend how to access and interact with the data for analytical purposes.
Business Intelligence Analysts: For knowing how the data flows and where they can extract the information they need for reports and dashboards.
Data Architects: Who are responsible for the overall structure and integrity of the data environment.
Technical Business Users: To gain insights into what data is available, how to access it, and under what conditions.
Data Governance Teams: For ensuring that the organization of data aligns with company policies and standards.
2: Data Organization
Data Organization serves as a crucial element of the Reference Architecture, designed to align data processes across various business units effectively.
The architecture's structure adheres to a Domain-Driven approach that aligns well with Solvay Atom's transformation goals and fosters data culture and accountability.
This chapter outlines the core components, the layered architecture, and the nine specific business Domains.
2.1 Architectural Layers
The Data Ocean architecture includes several layers to manage the complexity and demands of a data-driven organization:
2.1.1 Raw Data Layer (Ingestion Layer)
This is the foundational layer where data in its native format is ingested into the architecture. No data transformations occur at this stage, which ensures that the data can serve as a point-in-time archive. The layer is hierarchically organized based on subject areas, data sources, and time of ingestion. Access to this layer is restricted to prevent unauthorized or incorrect usage.
2.1.2 Normalized Data Layer (Staging)
This optional layer serves as an intermediary to enhance performance in transferring data from the Raw layer to the Curated layer. It stores data in an optimized format suitable for data cleansing and possible partitioning to a more granular level.
2.1.3 Cleansed Data Layer (ODS / Curated)
Here, data is transformed into consumable datasets, available either in files or tables. Before reaching this layer, the data undergoes a series of cleansing and transformation activities. It's also the most complex part of the architecture, as data is denormalized and different objects may be consolidated here.
2.1.4 Use-Case Oriented Layers (Domain/Product)
These specialized layers apply additional business logic or machine learning models to the data. They source data from the Cleansed layer and are enforced with any needed business logic or security measures. Data models to address business analytical requirements are also created here.
2.1.5 Sandbox
This optional layer is for advanced analysts and data scientists to conduct experiments. Here, they can perform tests to find patterns, correlations, or to validate machine learning models.
2.2 Domain-Driven Architecture
The data organization within the architecture is designed around nine business domains:
2.2.1 Business Domains
- HR (Human Resources): Focuses on employee data, covering aspects like recruitment, payroll, and performance metrics.
- Procurement: Centralizes data related to vendor management, contracts, and procurement cycles.
- Finance: Manages financial records, including budgets, income, expenses, and other fiscal reports.
- Marketing & Sales: Addresses customer interaction data, sales metrics, and market analysis.
- Supply Chain: Deals with logistical data concerning supply chain management, inventories, and distribution.
- Structure & Shared Domain: Contains data shared across various business units and aspects related to the organizational structure.
- Industrial: Houses data related to manufacturing, equipment health, and quality control.
- R&I (Research & Innovation): Maintains data on R&D projects, patents, and scientific research.
2.2.2. Additional Domains
- Technical Domain: This is where system metadata, context, and technical details are stored.
- Common Domain: For data that is shared across all business units, such as common referential information.
2.2.3 Domain Responsibilities
- Domains are responsible for creating and maintaining quality datasets.
- Each domain must ensure their data meets specified standards such as being discoverable, addressable, and trustworthy.
2.2.4 Roles Within Domains
- Data Product Owner: Responsible for consumer satisfaction, quality of the domain datasets, and overall data lifecycle management.
- Data Team: Focused on platform enhancements, monitoring, automation, and alerting.
2.2.5 Value and Benefits
- Centers data acquisition, processing, and serving with domain experts
- Decreases common data pain points like data cleansing and orientation
- Supports the emergence of a data product focus
2.2.6 Capabilities and Infrastructure
The architecture is equipped with:
- Scalable, secure, and governed storage
- Encryption standards
- Metadata management
- Data pipeline orchestration
- Unified Access Control
- Monitoring, alerting, and logging
- Self-service capabilities
2.2.7 Governance and Team Structure
- Aims to reduce duplication of effort across domains
- Provides essential shared services and tools
- Focuses on delivering value while adhering to security and governance protocols
By understanding the layers and the domain-driven approach, you can appreciate how the Data Ocean Architecture enables data to be effectively managed, secured, and leveraged for organizational success.
