Page tree


Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This chapter delves into the intricate data flow within the Data Ocean Architecture, drawing insights from the Reference Architecture.

2.1. Overview

The architecture is composed of multiple interconnected blocks, each serving a distinct purpose in the data processing pipeline.

...

The data flow within the Reference Architecture involves several key components, as described in the Reference Architecture, including:

  • Data Sources: Both structured and unstructured data from various domains are extracted for processing.
  • Data Capturing: Data is captured through batch and streaming processes.
  • Lake House Architecture: Data undergoes storage, curation, and provisioning.
  • Data Science and Machine Learning: Analytical processes are conducted on the data.
    • Self-service ETLThe architecture accommodates the capabilities; however, it does not explicitly endorse them due to their unique characteristics, which are closely intertwined with established best practices in Software Development and Architecture.
  • Data Management: Data undergoes cataloging, validation, in an orchestrated way.
  • Operations: Data security, workload management, environment management, and monitoring are applied.
  • Data Consumers: Data is accessed by BI tools and portals.

...

The data extraction process involves retrieving data from various sources as outlined in the Reference Architecture. It's crucial to adhere to specific time constraints to access internal business applications, ensuring minimal impact on business systems. These extraction jobs should prioritize simplicity and speed, operating without dependencies beyond agreed-upon timing and source system limitations.

...