Project Hierarchy in GCP (Org/Folder/Project Name):
- prj-data-dq-selfservice-test
- prj-data-dq-selfservice-dev
- prj-data-dq-selfservice-ppd
- prj-data-dq-selfservice-prod
GCP Project Overview: -
- Project Name: prj-data-dq-selfservice-dev
- Project ID: prj-data-dq-selfservice-dev
- Project Description:
- Project Owner:
- Project purpose: Provide Data Quality KPIs Dashboard, Insights, Failed records for Different scopes and domains across Solvay (Human Resources, Marketing and Sales, Finance, Structured and Shared, Procurement, Supply Chain).
- Project support teams (services scope):
- Ahmed Elsayed - Data Architect
- Maria Joao Pimenta - Data Engineer
- Ram Atirajyam - Data Engineer
- Ibrahim Mansey - Visualization Engineer
- Mohamed Hazem - Visualization Engineer
- Rawan Shehab Functional Analyst
GCP Resources/Services – Used in project:
- Dataplex
- Big Query
- GCP Buckets/storage
- Query Scheduler
Google Groups: -
Google Groups having access to this project:
- Data architects group name
- Data Engineers group name:
- Data analyst/Business analyst Group name.
- Other groups if any (EX. Data Developer group name)
2. Google groups newly created for this project:
Service Accounts used by this project: -
GCP Buckets:
- Bucket name, Folder, Objects – Path etc.
- Retention period
- Access policies
Big query:
- Datasets:
- DM
- DPL
- DataOcean_dataquality_kpi
- Dataplex_profiles_scans
- Tables:
- DM:
- DIM_date
- DIM_domain
- DIM_kpi_dimension
- DIM_quality_rule
- Dataplex_quality
- FACT_data_quality
- FACT_failed_records
- DPL: (Views)
- DataOcean_dataquality_kpi: (Views)
- Dataplex_profiles_scans:
- EmpBusiness-scan
- PositionJobInfo
- businessunit
- empJobPositionJoin
- empLocGroup_scan
- empcomp-scan
- empjob_profile
- emploc
- SQL Queries/views – Logic.
- DM: (Tables)
- DPL:
- V_DIM_DATE
- V_DIM_DOMAIN
- V_DIM_KPI_DIMENSION
- V_FACT_QUALITY
- V_RULE_QUALITY
- V_data_quality_metrics_dev
- DataOcean_dataquality_kpi:
- V_EmpJobRelationships
- V_EmpWorkPermit
- V_FOLocation
- V_LocationGroup
- V_PositionJobInfo
- V_User
- V_businessunit
- V_company
- V_costCenter
- V_empJobCC
- V_empLocGroup
- V_emp_compensation_job
- V_position
- V_ActiveEmployeeInActiveLegalEntity
- V_BusinessUnit
- V_CcHrFin
- V_Company
- V_EmpBusiness
- V_EmpCompPay
- V_EmpCompensation
- V_EmpJob
- V_EmpJobCompPay
- Dataplex_profiles_scans: (Partition Tables)
- Routines (Stored Procedures):
- DM.RT_DPtoDMmapping_Datespecific ( It is the main mapping function in order to populate the Model according to the latest weekly runs)
-- Populate DIM_date table
-- Populate DIM_quality_rule table
-- Populate FACT_data_quality table
-- Populate FACT_failed_records table
- RT_DPtoDMmapping_specific: (used to map specific rule in case there's an on demand run)
-- Populate DIM_date table
-- Populate DIM_quality_rule table
-- Populate FACT_data_quality table
-- Populate FACT_failed_records table
Data flow:
- Data Quality Check: Sources → Talend → Data Ocean → prj-data-dq-selfservice-*** → DataOcean_dataquality_kpi (Data Set)→ DataOcean_dataquality_kpi.Views (source views) → Dataplex → Dataplex_quality (Table) →RT_DPtoDMmapping_specific (Stored Procedure) → DIM_dateDIM_domainDIM_kpi_dimensionDIM_quality_ruleDataplex_qualityFACT_data_qualityFACT_failed_records (DM tables) → (DPL Views) → QlikSense
ProjectName:
- prj-data-dq-selfservice-test
- prj-data-dq-selfservice-dev
- prj-data-dq-selfservice-ppd
- prj-data-dq-selfservice-prod
prj-data-dq-selfservice-prod:
STG Schemas:
STG Schemas1:
STG Schema2:
STG Schema3:
Tablelist:
Data Ocean Schemas:
DS_xxx_yyy1
DS_xxx_yyy2
DS_xxx_yyy3
Reporting Schemas: