<Provide all the ETL Jobs details, Mapping & complex Transformations
       ETL jobs loading to Google Cloud Storage (GCS)
       Detailed flow of ETL's from GCS to STG 
       Detailed flow of ETL's from STG --> ODS
       Detailed flow of ETL's from ODS --> DataOcean
       Detailed flow of ETL's from DataOcean --> DM
       Complex Views and logic
       Views expected to be used by Business/Reporting tools>

Job Details

<Include all the Flows(with screenshots) & the respective Jobs and also description of the job like the purpose of job>

For example as below:

FlowNameTasksDescriptionSource_table/queryTarget_TableComplex transformation, if Yes,then logic
F1000_F_W_CO_MVSDTR01_0010<this is the Main orchestration flow which includes Data Ocean ref jobs and the new jobs>

Automated extraction of shipment details in order to calculate their CO2 emissions

Step 1: Compute the Year & Previous Month and store in a variable (in the format YYYYMM). 

Step 2: Compute meta run id using DataOcean framework job and store in a global variable

Step3: Build the BW query and pass the query to the Step 4.

Step4: Extract the data from xtract and load the csv to the cloud storage bucket.

Job: J1000_BW_Query_csv_to_GCS

Step 5: Using the Data Ocean reference jobs, process the above file in step 4 into STG & ODS tables.

ref to data ocean bucket to STG jobs: 2 - Move data from bucket to staging

STG to ODS: 3 - Move data from staging to ods

Step6: Extract the data from the ODS table for that batch execution

Job: J1100_BW_Query_ods_to_Bucket

Step 7: Using the Data Ocean reference jobs, Write the success logs into the log table.

TALEND_<ENV>_DI_BW_QRY_MVSDTR01_0010

cs-ew1-prj-data-dm-sust-<env>-staging/CO2_Emission/Final/<YYYYMM>_TALEND_PRE_PROD_CO2_BW_QRY_MVSDTR01_0010.xlsx
F1000_F_W_CO_MVSDTR01_0010J1000_BW_Query_csv_to_GCS

Use XtractUniversal tool to retrieve BW queries and upload the result to GCS

1.Extract the data from xtract and load the csv to the cloud storage bucket. Once the csv file bucket load is completed then delete the file in local Talend server.

BW: DI_BW_QRY_MVSDTR01_0010

Talend: TALEND_<env>_DI_BW_QRY_MVSDTR01_0010

cs-ew1-prj-data-dm-sust-<env>-staging/CO2_Emission
F1000_F_W_CO_MVSDTR01_0010J1100_BW_Query_ods_to_Bucket

Extract the data from the ODS table for that batch execution based on Meta_Execution_ID.

Query:

"SELECT * FROM "+ context.l_VAR_GCP_PROJECT_ID+"."+ context.l_LOCAL_VAR_STAGING_TO_ODS_DATASET_ODS + "."+context.l_LOCAL_VAR_STAGING_TO_ODS_Target_TABLE + " where meta_execution_id='" + context.Meta_Execution_ID + "'". 

The values for above context's can be found in the RDS database.

prj-data-dm-sust-<env>.ODS.ODS_BWH_0000_0000_F1000_F_W_CO_MVSDTR01_0010

cs-ew1-prj-data-dm-sust-test-staging/CO2_Emission/Final/YYYYMM_TALEND_DEV_CO2_BW_QRY_MVSDTR01_0010.xlsx



Data Validation/Troubleshoot queries

Data Validation:

<Provide the SQL queries to validate the data or the record count in the BQ target tables>

Logging:

<Details about the log tables in Big query>

Troubleshoot steps:

<Provide the steps to debug the ETL flow in case of failure. For example, if an ETL job fails, how should we trigger it? Can we directly rerun the job, or are there steps to delete the data from the previous execution before rerunning the job? or any change should be made in the context table for date executions>