View Source

High Level Project Architecture

Provide link here or the architecture diagram?

GCP Project

GCP project name: prj-dashb-eln-dev / prj-dashb-eln-test
Google Group: gcp-sl-dashb-eln-non-prod@solvay.com

Admins (as it is in Dec 2022):

sebastien.ladouce@solvay.com (Data Architect Industrial / R&I)

Here is a suggested template for Data Model + Data Mapping :

Data Model

Data Mapping

Schema showing the different STEPS of the application flow - with the data involved at each step

Link to data mapping :

Retrieve ELN and Success Factor data and upload them into GCS. More information on this step can be found here .

INFORMATION	DESCRIPTION
Tool	Talend
Access rights	ELN db : info saved into Keepass (path to keepass: \\ACEW1DTLNDENG02\Keepass ) API Management Platform : Application id and secret saved in keepass (path to keepass: \\ACEW1DTLNDENG02\Keepass)
Source	ELN Oracle database and Success Factor data (retrieved via SAP API management platform)
Location	Data are uploaded into the GCS bucket
Format	Data is saved as csv files inside GCS
Sizing	Each extraction should upload less then 50 MB of data in GCS (files are temporarily stored on the remote engine and removed at the end of the run)
Assessment	The following job (GCS to Staging) will validate that data is consistent
Scheduling	Data is extracting every day
Criticality	Medium
Logging	Logs are stored in the Talend remote engine

Data that has been uploaded to buckets is then moved to staging tables. More information on this step can be found here .

INFORMATION	DESCRIPTION
Tool	Talend
Access rights	GCP service account :
Source	GCS bucket
Location	GCS bucket
Format	Data is saved as csv files inside GCS and added as Big Query table
Sizing	Each extraction should upload less then 50 MB of data in GBQ. This step is supposed to take less then 5 minutes to process all tables.
Assessment	logs table allows to determine if the upload has been successful.
Scheduling	Data is extracting every day at it is executing right after having uploaded the csv files to GCS
Criticality	Medium
Logging	Logs are stored in the Talend remote engine

Some mappings/filters are configured using some mapping tables. This is managed in this Google Sheet .

Information is then inserted into ODS tables. More information on this step can be found here .

INFORMATION	DESCRIPTION
Tool	Talend
Access rights	GCP service account :
Source	GBQ Staging tables
Location	GBQ ODS tables
Format	Big Query tables
Sizing	Each extraction should upload less then 50 MB of data in GBQ. This step is supposed to take less then 5 minutes to process all tables.
Assessment	logs table allows to determine if the upload has been successful.
Scheduling	Data is extracting every day at it is executing right after having recreated the Staging tables
Criticality	Medium
Logging	Logs are stored in the Talend remote engine