Below is the high level architecture for the Data Quality KPIs monitoring tool.

A Master Talend job orchestrates the entire data processing pipeline. The process begins with the ingestion of data for each domain from various SAP source systems into Google Cloud Platform (GCP).
HR: Data is sourced from SAP SuccessFactors.
SSR: Data is sourced from SAP WP1 and SAP PF1.
FIN: Data is sourced from SAP BW, WP1, and PF1.
MRK: Data is sourced from SAP BW, WP1, and PF1.
Each table from each domain has its own dedicated Talend job responsible for ingesting and loading the data into the GCP BigQuery Service - ODS Dataset.
After ingestion, several routines are executed to populate the Data Model (DM) tables:
Routine: DM.insert_DIM_Domain populates DM.DIM_domain table.
Routine: DM.insert_DIM_kpi_dimension populates DM.DIM_kpi_dimension table.
The data quality rules are executed using GCP Dataplex Service.
A routine is executed to populate the Data Model (DM) tables:
Routine: DM.RT_DPtoDMmapping_Datespecific populates the following tables:
DM.DIM_DATE
DM.DIM_quality_rule
DM.FACT_data_quality
DM.FACT_failed_records
A final Talend job - PL_DQ_BQ_to_Gshet_Selfservice - handles failed records:
Generates a CSV file with failed records.
Uploads the CSV file to a Google Drive folder.
Updates DM.FACT_failed_records with the URL to the CSV file.
The processed and validated data is available for visualization and analysis in Qlik Sense.
| Sources Definition | Identify different data sources and how to ingest either raw data, or dq metrics depending on the source |
| Identified Sources | SAP BW, SuccessFactors, Big Query (Data Ocean) |
| ETL and Preparation | Talend used to load and ingest tables in BigQuery, Leveraging automation and orchestration capabilities to automate the recurring jobs. |
| Source For DQ KPIs Monitoring Project | Big Query Data Ocean is the sole source for Quality checks after tables are loaded from their original sources. |
| Data Residing in Big Query | Data residing on Big Query brought in prj-data-dq-selfservice-dev project using the necessary views to be Scanned and checked by DataPlex, the generated metrics naturally reside in Big Query (Dataplex_quality) |
| Data Quality Metrics Ingestion | Populating the BigQuery fact table with DQ metrics obtained from the previously mentioned sources By Mapping the resulting table to the developed DM (Using a stored procedure) |
| Big Query Views | Creating the views necessary to answer Business requirements, providing them with degree of flexibility in a way that they can have more control over the data the need to query according to business changing requirements, to approach selfservice. |
| Visualization | Uses QlikSense that connects to bigQuery views to ingest data, Use Qlik Sense to Visualize, present, add alerting capabilities for different business domains Different KPIs and Different Business Rules. and Failed records |
| Google Drive | Used to store failed records Data, then URL for the sheet is inserted in FACT_failed_records associated with quality_rule_key. |

The out put of Dataplex quality checks is Dataplex_quality table, It is mapped to the Data Model "in the previous section", on Big query to later be used as the sole data source for QlikSense visualization.
Data Mapping is detailed in this document.
Dataplex to Data Model Mapping Stored Procedure:
Procedure Name:
Time of Runs and Duration Window:
| Dataplex | 4:00 - 5:00 CET |
| BigQuery Routine | 5:00 - 6:00 CET |
| Talend | 6:00 - 9:00 CET |
| QlikSense | 9:00 CET |
GCP Monitoring tools:
No Identified Bugs.