High-Level Design architecture (HLD)
Link.
Low-Level Design architecture (LLD)
Link.
Architecture Data Flow
Here is a suggested template for Data Model + Data Mapping :
DataPrep Flow
Schema showing the different STEPS of the application flow - with the data involved at each step
According to Data Ocean Blueprint
Accolade --> GCP Data Ocean DT (prj-data-dm-dt-[env]) --> GCP Product PMO (prj-data-pmo-dash-[env]) --> Output Tableau
Conceptual Data Model
Steps descriptions
Note: Table on project prj-data-dm-dt-[env]
DataSource 1 - Accolade DB
Description
Accolade is the application where users (mainly project managers) share data (status, forecast as examples) related to a project.
Tools
Talend to collect the data from the source and store on GCP/Google Big Query.
Access rights
Only authorized users can connect into Accolade application.
Only DA&AI DataEng's and Data Architects can access data on GCP/Google Big Query.
Source
Location
Accolade DB via Talend generic account created for this purpose.
Format
MSSQL direct table and run procedure.
Destination
Location
Extracted data will be stored on GCP/Cloud storage and Google Big Query.
Format
The format of the data saved in the databank
Sizing
Expected data volume for :
- full process from source to staging (as of 6 Dec 2023)
- incremental process from ODS to DM (as of 6 Dec 2023)
Assessment
Check the log tables in GCP on table log_tables and run_jobs to check that there is no error loading from source to staging/ods
Check the surrogate key must be unique in the data mart layer
Scheduling
Is there an automatic schedule ? Yes
At what frequency ? to collect data 4 times/day ( every 6hour).
What is the trigger ? TMC
Timing
The average time expected for :
- 4 times/day (working days: from Monday to Friday) : to be scheduled. 2:00, 8:00, 14:00, 20:00 CET (monitor by DataOps only at 8:00 and 14:00)
- full process (source to ODS)
- incremental process (ODS to DM)
Criticality
High/ Medium /Low
Logging
Table table log_tables, run_jobs, log_files, and reject_files in ` prj-data-dm-dt-[environment].STG.[table]`
Draft (Notes) for DataOps Understanding:
prj-data-dm-dt :
Source → GCP Buckets --> prj-data-dm-dt.STG. --> prj-data-dm-dt.ODS --> prj-data-dm-dt.DM . --> prj-data-dm-dt.ds_pmo_dashboard. |
||
\/
Reporting will be done from PRJ-DATA-PMO-DASH-DEV project on DPL dataset (prj-data-pmo-dash-dev.DPL).
Tableau --> prj-data-pmo-dash-dev.DPL --> Prj-data-pmo-dash-dev.DataOcean --> prj-data-dm-dt.ds_pmo_dashboard.





