Page tree


You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »


Below are the list of Talend jobs which were developed as part of Global Supply Chain CO2 Emission Shipment costs detailed analysis TIERS (Core Query).

Talend Project :  DATA_OCEAN_DOMAIN_SUSTAINABILITY

GCP Project : prj-data-dm-sust-prod

Schedule : jobs are scheduled to run on 3rd of every Month.

Talend Plan in TMC: PL_GSC_CO2Emission_Shipment

BQ Details

BQ_DatasetBQ_Table

CloudStorage Bucket

STGSTG_BWH_0000_0000_F1000_F_W_CO_MVSDTR01_0010

cs-ew1-prj-data-dm-sust-ppd-staging/CO2_Emission/Processed

ODSODS_BWH_0000_0000_F1000_F_W_CO_MVSDTR01_0010

cs-ew1-prj-data-dm-sust-ppd-staging/CO2_Emission/Processed

Job Details

<Include all the Flows(with screenshots) & the respective Jobs and also description of the job like the purpose of job>

For example as below:

FlowNameTasksDescriptionSource_table/queryTarget_TableComplex transformation, if Yes,then logic
F1000_F_W_CO_MVSDTR01_0010<this is the Main orchestration flow which includes Data Ocean ref jobs and the new jobs>

Automated extraction of shipment details in order to calculate their CO2 emissions

Step 1: Compute the Year & Previous Month and store in a variable (in the format YYYYMM). 

Step 2: Compute meta run id using DataOcean framework job and store in a global variable

Step3: Build the BW query and pass the query to the Step 4.

Step4: Extract the data from xtract and load the csv to the cloud storage bucket.

Job: J1000_BW_Query_csv_to_GCS

Step 5: Using the Data Ocean reference jobs, process the above file in step 4 into STG & ODS tables.

ref to data ocean bucket to STG jobs: 2 - Move data from bucket to staging

STG to ODS: 3 - Move data from staging to ods

Step6: Extract the data from the ODS table for that batch execution

Job: J1100_BW_Query_ods_to_Bucket

Step 7: Using the Data Ocean reference jobs, Write the success logs into the log table.

TALEND_<ENV>_DI_BW_QRY_MVSDTR01_0010

cs-ew1-prj-data-dm-sust-<env>-staging/CO2_Emission/Final/<YYYYMM>_TALEND_PRE_PROD_CO2_BW_QRY_MVSDTR01_0010.xlsx
F1000_F_W_CO_MVSDTR01_0010J1000_BW_Query_csv_to_GCS

Use XtractUniversal tool to retrieve BW queries and upload the result to GCS

1.Extract the data from xtract and load the csv to the cloud storage bucket. Once the csv file bucket load is completed then delete the file in local Talend server.

BW: DI_BW_QRY_MVSDTR01_0010

Talend: TALEND_<env>_DI_BW_QRY_MVSDTR01_0010

cs-ew1-prj-data-dm-sust-<env>-staging/CO2_Emission
F1000_F_W_CO_MVSDTR01_0010J1100_BW_Query_ods_to_Bucket

Extract the data from the ODS table for that batch execution based on Meta_Execution_ID.

Query:

"SELECT * FROM "+ context.l_VAR_GCP_PROJECT_ID+"."+ context.l_LOCAL_VAR_STAGING_TO_ODS_DATASET_ODS + "."+context.l_LOCAL_VAR_STAGING_TO_ODS_Target_TABLE + " where meta_execution_id='" + context.Meta_Execution_ID + "'". 

The values for above context's can be found in the RDS database.

prj-data-dm-sust-<env>.ODS.ODS_BWH_0000_0000_F1000_F_W_CO_MVSDTR01_0010

cs-ew1-prj-data-dm-sust-test-staging/CO2_Emission/Final/YYYYMM_TALEND_DEV_CO2_BW_QRY_MVSDTR01_0010.xlsx



Data Validation/Troubleshoot queries

Data Validation:

<Provide the SQL queries to validate the data or the record count in the BQ target tables>

Logging:

<Details about the log tables in Big query>

Troubleshoot steps:

<Provide the steps to debug the ETL flow in case of failure. For example, if an ETL job fails, how should we trigger it? Can we directly rerun the job, or are there steps to delete the data from the previous execution before rerunning the job? or any change should be made in the context table for date executions>