Page tree


Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This chapter delves into the intricate data flow within the Data Ocean Architecture, drawing insights from the Reference Architecture.

2.1. Overview

The architecture is composed of multiple interconnected blocks, each serving a distinct purpose in the data processing pipeline.

...

The data flow within the Reference Architecture involves several key components, as described in the Reference Architecture, including:

  • Data Sources: Both structured and unstructured data from various domains are extracted for processing.
  • Data Capturing: Data is captured through batch and streaming processes.
  • Lake House Architecture: Data undergoes storage, curation, and provisioning.
  • Data Science and Machine Learning: Analytical processes are conducted on the data.
    • Self-service ETLThe architecture accommodates the capabilities; however, it does not explicitly endorse them due to their unique characteristics, which are closely intertwined with established best practices in Software Development and Architecture.
  • Data Management: Data undergoes cataloging, validation, in an orchestrated way.
  • Operations: Data security, workload management, environment management, and monitoring are applied.
  • Data Consumers: Data is accessed by BI tools and portals.

...

The data extraction process involves retrieving data from various sources as outlined in the Reference Architecture. It's crucial to adhere to specific time constraints to access internal business applications, ensuring minimal impact on business systems. These extraction jobs should prioritize simplicity and speed, operating without dependencies beyond agreed-upon timing and source system limitations.

...

For more in-depth information, please refer to the section on Data Curation.

It's noteworthy that in cases where simple file formats like CSV are utilized, they maintain a direct one-to-one correspondence with their respective source data. However, when dealing with intricate file formats, a one-to-many relationship can emerge between the source and their table formats, due to the complexities involved.

...

  1. Data Protection: Sensitive data is safeguarded from unauthorized access, reducing the risk of data breaches and leaks.

  2. Compliance: Data access controls align with industry regulations and compliance standards, ensuring that data handling practices meet legal requirements.

  3. Data Integrity: Security measures prevent unauthorized data modification, preserving data accuracy and integrity.

  4. Efficient Collaboration: Secure access controls enable collaboration between business and technical teams without compromising data security.

  5. Trust: Establishing robust security measures builds trust with stakeholders and customers, demonstrating a commitment to data privacy and confidentiality.

7. Conclusion

The Data Ocean Reference Architecture lays the foundation for a robust and comprehensive data management framework that aligns seamlessly with Solvay Atom's transformation goals. Through its layered approach and Domain-Driven Design philosophy, this architecture addresses the complex challenges of data organization, processing, and utilization. By defining distinct stages for data flow – from Raw Data to Curated Data to Use-Case Oriented Layers – the architecture ensures that data is ingested, processed, and transformed in a controlled and structured manner.

The architecture's emphasis on business Domains fosters a culture of data ownership, accountability, and collaboration across different functional areas. The nine identified business Domains, ranging from HR to Finance to Marketing, reflect the diverse nature of Solvay Atom's operations and provide a tailored approach to data management for each domain. The Domain-Driven Architecture promotes scalability, maintainability, and governed data sets, resulting in higher data quality, consistency, and reliability.

By incorporating security by design other and security measures throughout the architecture, data integrity and confidentiality are upheld. Users are granted specific access rights based on their roles, ensuring that sensitive information remains protected and only accessible to authorized personnel. This comprehensive security approach mitigates risks, ensures compliance with regulatory standards, and builds trust among stakeholders.

In conclusion, the Data Ocean Reference Architecture establishes a solid framework that enables Solvay Atom to harness the full potential of its data assets. By aligning data processes, emphasizing Domain ownership, and implementing stringent security measures, this architecture provides a strategic advantage in leveraging data for informed decision-making, innovation, and sustainable growth.