Architecture

Below is the high level architecture for the Data Quality KPIs monitoring tool.


Master Talend job orchestrates the entire data processing pipeline:

1. Data Ingestion

The process begins with the ingestion of data for each domain from various SAP source systems:

Each table from each domain has its own dedicated Talend job responsible for ingesting and loading the data into the GCP BigQuery Data Ocean, specifically the following datasets:

2. Data Processing and Transformation

After ingestion, two routines are executed to populate the Data Model (DM) Dimension tables:

Also, views including only the necessary data are created in the following datasets:

This views are the sole source for the for the quality checks performed by Dataplex.

3. Data Quality Execution in Dataplex

Once the views are created, the data quality rules are executed using GCP Dataplex Service and the validation results are stored in the following BigQuery table:

4. Data Model Population

A routine is executed to populate the DM Fact tables:

5. Failed Records Handling & Export

final Talend job - PL_DQ_BQ_to_Gshet_Selfservice - handles failed records:

[env] is one of the following: dev, test, ppd, prod

6. Visualization in Qlik Sense

The processed and validated data is available for visualization and analysis in Qlik Sense. 

Data Model

Scheduling

Below are the scheduled processes.  

1. Talend Ingestion Jobs


2. Data Quality Scans

Initially "On Demand" for testing purposes and then "Scheduled" to run every week within Dataplex.

3. Routines Execution

The routines are triggered using scheduled queries on a weekly basis within BigQuery.

4. Talend Report Job


5. QlikSense Refresh

The QlikSense refresh schedule is set by the Visualization Engineer within QlikSense.


Time of Runs and Duration Window


ProcessDuration WindowAverage Duration Period
Talend Source Data Ingestion Sunday > 21:00 CET
Dataplex Business RulesMonday > 4:00 - 5:00 CET1 min
BigQuery Routine DIM_DomainMonday > 5:00 - 5:05 CET1 min
BigQuery Routine DIM_KPI_DimensionMonday > 5:05 - 5:10 CET1 min
BigQuery Routine Data ModelMonday > 5:10 - 5:15 CET1 min
Talend Failed Record processMonday > 6:00 - 7:00 CET5 min
QlikSenseMonday > 8:00 CET1 min


Monitoring

GCP Monitoring tools:

Error Handling

Known Bugs

No Identified Bugs.

Roadmap

FSD

TSD