Architecture
Below is the high level architecture for the Data Quality KPIs monitoring tool.
A Master Talend job orchestrates the entire data processing pipeline:
1. Data Ingestion
The process begins with the ingestion of data for each domain from various SAP source systems:
- HR data is sourced from SAP SuccessFactors
SSR data is sourced from SAP WP1 and SAP PF1
FIN data is sourced from SAP BW, WP1, and SAP PF1
MRK data is sourced from SAP BW, WP1, and SAP PF1
Each table from each domain has its own dedicated Talend job responsible for ingesting and loading the data into the GCP BigQuery Data Ocean, specifically the following datasets:
- prj-data-dm-hr-[env].ODS
- prj-data-dm-structure-[env].ODS
- prj-data-dm-finance-[env].ODS
- prj-data-dm-marketing-[env].ODS
- prj-data-dq-selfservice-[env].ODS
2. Data Processing and Transformation
After ingestion, two routines are executed to populate the Data Model (DM) Dimension tables:
Routine prj-data-dq-selfservice-[env].DM.insert_DIM_Domain populates prj-data-dq-selfservice-[env].DM.DIM_domain table
Routine prj-data-dq-selfservice-[env].DM.insert_DIM_kpi_dimension populates prj-data-dq-selfservice-[env].DM.DIM_kpi_dimension table
Also, views including only the necessary data are created in the following datasets:
- prj-data-dm-hr-[env].DS_prj_dqkpi
- prj-data-dm-structure-[env].DS_prj_sls_dataquality_kpi
- prj-data-dm-finance-[env].DS_prj_sls_dataquality_kpi
- prj-data-dm-marketing-[env].DS_prj_sls_dataquality_kpi
- prj-data-dq-selfservice-[env].DS_prj_sls_dataquality_kpi
This views are the sole source for the for the quality checks performed by Dataplex.
3. Data Quality Execution in Dataplex
Once the views are created, the data quality rules are executed using GCP Dataplex Service and the validation results are stored in the following BigQuery table:
- prj-data-dq-selfservice-[env].DM.Dataplex_quality
4. Data Model Population
A routine is executed to populate the DM Fact tables:
Routine prj-data-dq-selfservice-[env].DM.RT_DPtoDMmapping_Datespecific populates the following tables:
prj-data-dq-selfservice-[env].DM.DIM_DATE
prj-data-dq-selfservice-[env].DM.DIM_quality_rule
prj-data-dq-selfservice-[env].DM.FACT_data_quality
prj-data-dq-selfservice-[env].DM.FACT_failed_records
5. Failed Records Handling & Export
A final Talend job - PL_DQ_BQ_to_Gshet_Selfservice - handles failed records:
- Generates a CSV file with failed records.
- Uploads the CSV file to a Google Drive folder.
- Updates prj-data-dq-selfservice-[env].DM.FACT_failed_records with the URL to the CSV file, associated with the quality_rule_key.
[env] is one of the following: dev, test, ppd, prod
6. Visualization in Qlik Sense
The processed and validated data is available for visualization and analysis in Qlik Sense.
Data Model
Data Mapping:
The out put of Dataplex quality checks is Dataplex_quality table, It is mapped to the Data Model "in the previous section", on Big query to later be used as the sole data source for QlikSense visualization.
Data Mapping is detailed in this document.
Procedures
Dataplex to Data Model Mapping Stored Procedure:
Procedure Name:
Scheduling:
- Scheduled Profile Scans: Runs on Dataplex and scheduled while created.
- Scheduled Data Quality Rules Scan: Runs on Dataplex and scheduled while created.
- Scheduled Stored Procedure Run: Routine is triggered using scheduled query on weekly Basis.
- Scheduled QlikSense Refresh: set by the visualization engineer on qliksense.
Time of Runs and Duration Window:
| Dataplex | 4:00 - 5:00 CET |
| BigQuery Routine | 5:00 - 6:00 CET |
| Talend | 6:00 - 9:00 CET |
| QlikSense | 9:00 CET |
Monitoring
GCP Monitoring tools:
- Dataplex Logs
- Big Query Logs
- Cloud Monitoring Dashboard
Error Handling
- Failure alert are set in rule creation to alert stakeholders/users when a rule fails.
- Stored procedure scheduling failure alert is sent in case the scheduled Routine, doesn't run as intended.
Known Bugs
No Identified Bugs.

