<Provide all the ETL Jobs details, Mapping & complex Transformations
ETL jobs loading to Google Cloud Storage (GCS)
Detailed flow of ETL's from GCS to STG
Detailed flow of ETL's from STG --> ODS
Detailed flow of ETL's from ODS --> DataOcean
Detailed flow of ETL's from DataOcean --> DM
Complex Views and logic
Views expected to be used by Business/Reporting tools>
<Include all the Flows(with screenshots) & the respective Jobs and also description of the job like the purpose of job>
For example as below:
| FlowName | Tasks | Description | Source_table/query | Target_Table | Complex transformation, if Yes,then logic |
| F1000_F_W_CO_MVSDTR01_0010 | <this is the Main orchestration flow which includes Data Ocean ref jobs and the new jobs> | Automated extraction of shipment details in order to calculate their CO2 emissions
Step 1: Compute the Year & Previous Month and store in a variable (in the format YYYYMM). Step 2: Compute meta run id using DataOcean framework job and store in a global variable Step3: Build the BW query and pass the query to the Step 4. Step4: Extract the data from xtract and load the csv to the cloud storage bucket. Job: J1000_BW_Query_csv_to_GCS Step 5: Using the Data Ocean reference jobs, process the above file in step 4 into STG & ODS tables. ref to data ocean jobs: 2 - Move data from bucket to staging Step6: Extract the data from the ODS table for that batch execution Job: J1100_BW_Query_ods_to_Bucket Step 7: Using the Data Ocean reference jobs, Write the success logs into the log table. | TALEND_<ENV>_DI_BW_QRY_MVSDTR01_0010 | cs-ew1-prj-data-dm-sust-<env>-staging/CO2_Emission/Final/<YYYYMM>_TALEND_PRE_PROD_CO2_BW_QRY_MVSDTR01_0010.xlsx | |
| F1000_F_W_CO_MVSDTR01_0010 | J1000_BW_Query_csv_to_GCS | Use XtractUniversal tool to retrieve BW queries and upload the result to GCS 1.Extract the data from xtract and load the csv to the cloud storage bucket. Once the csv file bucket load is completed then delete the file in local Talend server. | BW: DI_BW_QRY_MVSDTR01_0010 Talend: TALEND_<env>_DI_BW_QRY_MVSDTR01_0010 | cs-ew1-prj-data-dm-sust-<env>-staging/CO2_Emission | |
| F1000_F_W_CO_MVSDTR01_0010 | J1100_BW_Query_ods_to_Bucket |
Extract the data from the ODS table for that batch execution based on Meta_Execution_ID. Query: "SELECT * FROM "+ context.l_VAR_GCP_PROJECT_ID+"."+ context.l_LOCAL_VAR_STAGING_TO_ODS_DATASET_ODS + "."+context.l_LOCAL_VAR_STAGING_TO_ODS_Target_TABLE + " where meta_execution_id='" + context.Meta_Execution_ID + "'". The values for above context's can be found in the RDS database. | prj-data-dm-sust-<env>.ODS.ODS_BWH_0000_0000_F1000_F_W_CO_MVSDTR01_0010 | cs-ew1-prj-data-dm-sust-test-staging/CO2_Emission/Final/YYYYMM_TALEND_DEV_CO2_BW_QRY_MVSDTR01_0010.xlsx |
Data Validation:
<Provide the SQL queries to validate the data or the record count in the BQ target tables>
Logging:
<Details about the log tables in Big query>
Troubleshoot steps:
<Provide the steps to debug the ETL flow in case of failure. For example, if an ETL job fails, how should we trigger it? Can we directly rerun the job, or are there steps to delete the data from the previous execution before rerunning the job? or any change should be made in the context table for date executions>