This document provides an overview of the two main Google Cloud Platform (GCP) services leveraged in this project: BigQuery and Dataplex. These services work together to enable efficient data storage, processing, and quality monitoring.
- BigQuery is GCP's fully managed, serverless data warehouse designed for fast SQL-based analysis on large datasets. It is used to store, organize, and process data ingested from various source systems. BigQuery serves as the central data repository for this project.
- Dataplex is GCP's intelligent data fabric solution, which allows for unified data management, governance, and quality control across distributed data. In this project, Dataplex is responsible for the execution of data quality rules, ensuring the consistency and reliability of data processed and stored in BigQuery.
Project Hierarchy
The project structure within GCP has been organized to ensure clear separation between domains and environments, while supporting an efficient data ingestion and data quality validation process.
Domain-specific Projects
For each domain within the Data Quality Monitoring Tool (DQMT), a dedicated GCP project exists for every environment:
Development (dev)
Testing (test)
Pre-Production (ppd)
Production (prod)
Each project serves as the location where:
- The data is ingested from source systems.
- The initial views used for data processing and quality checks are created.
Domain Projects List
| Domain\Environment | Development | Testing | Pre-Production | Production |
|---|---|---|---|---|
| Human Resources | prj-data-dm-hr-dev | prj-data-dm-hr-test | prj-data-dm-hr-ppd | prj-data-dm-hr-prod |
| Structure & Shared | prj-data-dm-structure-dev | prj-data-dm-structure-test | prj-data-dm-structure-ppd | prj-data-dm-structure-prod |
| Finance | prj-data-dm-finance-dev | prj-data-dm-finance-test | prj-data-dm-finance-ppd | prj-data-dm-finance-prod |
| Marketing | prj-data-dm-marketing-dev | prj-data-dm-marketing-test | prj-data-dm-marketing-ppd | prj-data-dm-marketing-prod |
| Procurement | prj-data-dm-procurement-dev | prj-data-dm-procurement-test | prj-data-dm-procurement-ppd | prj-data-dm-procurement-prod |
Data Quality & Final Views Projects
In addition to the domain-specific projects, a separate set of projects is used to:
Import the final views generated by the domain projects.
Define and execute data quality rules through Dataplex.
| Environment | Project |
|---|---|
| Development | prj-data-dq-selfservice-dev |
| Testing | prj-data-dq-selfservice-test |
| Pre-Production | prj-data-dq-selfservice-ppd |
| Production | prj-data-dq-selfservice-prod |
Project Support Teams
The successful implementation and maintenance of the DQMT solution is supported by a dedicated team, each contributing with specialized skills across different areas of the project:
| Name | Role | Scope |
|---|---|---|
| Ahmed Elsayed | Data Architect | Architecture and design of data pipelines and models |
| Maria João Pimenta | Data Engineer | Data ingestion, transformation, and automation |
| Ram Atirajyam | Data Engineer | Data ingestion, transformation, and automation |
| Ibrahim Mansey | Visualization Engineer | Data visualization and dashboard development |
| Mohamed Hazem | Visualization Engineer | Data visualization and dashboard development |
| Rawan Shehab | Functional Analyst | Business analysis and functional requirements |
Project Access and Service Accounts
Google Groups
The following Google Groups have access to the DQMT GCP projects, organized by role:
| Group | Purpose | |
|---|---|---|
| Data Architects Group | Access for Data Architects | gcp-da-prj-data-dq-selfservice-nonprod@solvay.com |
| Data Engineers Group | Access for Data Engineers | gcp-de-prj-data-dq-selfservice-nonprod@solvay.com |
| Data Analysts / Business Analysts | No specific group | — |
| Data Developers Group | Access for Data Developers | gcp-dv-prj-data-dq-selfservice@solvay.com |
Note: No new Google Groups were specifically created for this project.
Service Accounts
The following Service Accounts are used within the DQMT project for process automation and integration:
| Service Account | Description |
|---|---|
| sbs-is-appli-qlikview.support@solvay.com | QlikView integration and support |
| sa-talend@prj-data-dq-selfservice-dev.iam.gserviceaccount.com | Talend jobs execution |
| sa-cloudfunction@prj-data-dq-selfservice-dev.iam.gserviceaccount.com | Cloud Functions automation |