<Provide all the ETL Jobs details, Mapping & complex Transformations
ETL jobs loading to Google Cloud Storage (GCS)
Detailed flow of ETL's from GCS to STG
Detailed flow of ETL's from STG --> ODS
Detailed flow of ETL's from ODS --> DataOcean
Detailed flow of ETL's from DataOcean --> DM
Complex Views and logic
Views expected to be used by Business/Reporting tools>
<Include all the Flows(with screenshots) & the respective Jobs and also description of the job like the purpose of job>
For example as below:
| FlowName | Tasks | Description | Source_table/query | Target_Table | Complex transformation, if Yes,then logic |
| F1000_F_W_CO_MVSDTR01_0010 | J1000_BW_Query_csv_to_GCS | Automated extraction of shipment details in order to calculate their CO2 emissions: Use XtractUniversal tool to retrieve BW queries and upload the result to GCS 1.Extract the data from xtract and load the csv to the cloud storage bucket. Once the csv file bucket load is completed then delete the file in local Talend server. | BW: DI_BW_QRY_MVSDTR01_0010 Talend: TALEND_<env>_DI_BW_QRY_MVSDTR01_0010 | cs-ew1-prj-data-dm-sust-<env>-staging/CO2_Emission | |
| F1000_F_W_CO_MVSDTR01_0010 | J1100_BW_Query_ods_to_Bucket |
Extract the data from the ODS table for that batch execution based on Meta_Execution_ID. Query: "SELECT * FROM "+ context.l_VAR_GCP_PROJECT_ID+"."+ context.l_LOCAL_VAR_STAGING_TO_ODS_DATASET_ODS + "."+context.l_LOCAL_VAR_STAGING_TO_ODS_Target_TABLE + " where meta_execution_id='" + context.Meta_Execution_ID + "'". The values for above context's can be found in the RDS database. | prj-data-dm-sust-<env>.ODS.ODS_BWH_0000_0000_F1000_F_W_CO_MVSDTR01_0010 | cs-ew1-prj-data-dm-sust-test-staging/CO2_Emission/Final/YYYYMM_TALEND_DEV_CO2_BW_QRY_MVSDTR01_0010.xlsx |
Data Validation:
<Provide the SQL queries to validate the data or the record count in the BQ target tables>
Logging:
<Details about the log tables in Big query>
Troubleshoot steps:
<Provide the steps to debug the ETL flow in case of failure. For example, if an ETL job fails, how should we trigger it? Can we directly rerun the job, or are there steps to delete the data from the previous execution before rerunning the job? or any change should be made in the context table for date executions>