High-Level Design architecture (HLD)

Link.


Low-Level Design architecture (LLD)

Link.

Architecture Data Flow

Here is a suggested template for Data Model + Data Mapping :

DA&DA - Domain Mapping DT


DataPrep Flow

Schema showing the different STEPS of the application flow - with the data involved at each step

According to Data Ocean Blueprint

Accolade --> GCP Data Ocean DT (prj-data-dm-dt-[env])  --> GCP Product PMO (prj-data-pmo-dash-[env]) --> Output Tableau


Conceptual Data Model


Steps descriptions

Note: Table on project prj-data-dm-dt-[env]

DataSource 1 - Accolade DB

Description

Accolade is the application where users (mainly project managers) share data (status, forecast as examples) related to a project. 

Tools

Talend to collect the data from the source and store on GCP/Google Big Query.

Access rights

Only authorized users can connect into Accolade application.

Only DA&AI DataEng's and Data Architects can access data on GCP/Google Big Query.

Source

Location

Accolade DB via Talend generic account created for this purpose.

Test = acew1twegodb01.nonprod.aws.cloud.solvay.com

Prod = acew1pwegodb01.prod.aws.cloud.solvay.com.

Format

MSSQL direct table and run procedure.

Destination

Location

Extracted data will be stored on GCP/Cloud storage and Google Big Query.

Format

The format of the data saved in the databank

Sizing

Expected data volume for :

  • full process from source to staging (as of 6 Dec 2023)

  • incremental process from ODS to DM (as of 6 Dec 2023)

Assessment

Check the log tables in GCP on table log_tables and run_jobs to check that there is no error loading from source to staging/ods

Check the surrogate key must be unique in the data mart layer

Scheduling

Is there an automatic schedule ? Yes

At what frequency ? to collect data 4 times/day ( every 6hour). 

What is the trigger ? TMC

Timing

The average time expected for :

  • 4 times/day (working days: from Monday to Friday) : to be scheduled. 2:00,  8:00, 14:00, 20:00 CET (monitor by DataOps only at 8:00 and 14:00)
  • full process (source to ODS)
  • incremental process (ODS to DM)

Criticality

High / Medium / Low

Logging

Table table log_tables, run_jobs, log_files, and reject_files in ` prj-data-dm-dt-[environment].STG.[table]`


Draft (Notes) for DataOps Understanding:

prj-data-dm-dt :

Source →  GCP Buckets -->  prj-data-dm-dt.STG. -->  prj-data-dm-dt.ODS --> prj-data-dm-dt.DM . -->  prj-data-dm-dt.ds_pmo_dashboard.  | 
|| 
\/

Reporting will be done from PRJ-DATA-PMO-DASH-DEV project on  DPL dataset (prj-data-pmo-dash-dev.DPL).

Tableau --> prj-data-pmo-dash-dev.DPL   --> Prj-data-pmo-dash-dev.DataOcean  -->   prj-data-dm-dt.ds_pmo_dashboard.



  • No labels