4.1 - Talend Integration
Source data integration with Talend ETL tool
Data Transformation and Loading to Google BigQuery:
- Once data from all sources is available in Google Cloud Storage as files, Talend proceeds with data transformation and loading.
- Talend performs data transformations as needed, including cleansing, mapping, and structuring the data for consistency.
- The transformed data is loaded into various stages, operational data stores (ODS), and data mart tables within Google BigQuery.
- These tables are organized to facilitate efficient querying and reporting for energy optimization purposes.
By utilizing Talend for data extraction, transformation, and loading (ETL), the web app ensures that data from diverse sources is collected, processed, and structured for analysis and reporting within Google BigQuery, enabling users to make informed decisions based on up-to-date and accurate data.
| Main jobs for source extraction | J130_mainExtract_Files_Src_to_GCS J120_GSC_mainExtract_BWQ_Data J110_GSC_mainExtract_Gsheet_data
| --to the top ↑-- |
|---|
| Job description by steps | Job design |
|---|
| - In the job F100_PUR_DATA_EXTRACT:
- A parallel execution of all data sources is launched
The flow triggers the extraction of all sources needed for the Global Supply chain dashboard.
| |
|---|
| Main jobs for source extraction | J110_GSC_mainExtract_Gsheet_data
| --to the top ↑-- |
|---|
| Job description by steps | Job design |
|---|
| - Parallel extraction of all Google Sheet sources:
- A parallel extraction of all Google Sheet is launched
- Simple transformation are done to align with specifications
- Upload files to GCS
- The files are uploaded into the GSC folder of the target bucket
| |
|---|
| Main jobs for source extraction | J130_GSC_mainExtract_Files_Src_to_GCS
| --to the top ↑-- |
|---|
| Job description by steps | Job design |
|---|
| - Filename creation:
- A specific filename following our standard naming convention is created to properly upload files into Cloud Storage
- Parallel GCS upload
- Each Google Sheet table is read
- Simple transformation are done to align with specifications resulting in a CSV stored in the Tmp folder
- Each file is uploaded into the GSC folder of the target bucket
| |
|---|
| Main jobs for source extraction | J120_GSC_mainExtract_BWQ_Data_Source
| --to the top ↑-- |
|---|
| Job description by steps | Job design |
|---|
| - Parellel execution of Xtract jobs:
- The job triggers in parallel all the jobs needed to extract data from BW.
| |
|---|
4.2 - Load Staging and ODS
| Main jobs for loading Staging | - F200_GSC_Gcs_To_Stg
- J200_GSC_Gcs_To_Stg
| --to the top ↑-- |
|---|
| Job description by steps | Job design |
|---|
| - First the job retrieves the value of the Business date from the file business_date.csv file. This is to target only the files of the current execution in GCS.
The flow is used to call the job J200_GSC_Gcs_To_Stg.
| |
|---|
| - The job launches a parallel load per source type into the Staging tables
- A list of the possible CSV file with the associated Staging table name, and separator is used to create a sequence
- Each staging table is truncated before a new upload
- The job gets all the files starting with the desired prefix and for each of them calls the standard job J001_GCS_to_STAGING_LOCAL
| |
|---|
| Main jobs for loading the ODS | F300_GSC_STG_TO_ODSJ300_GSC_STG_TO_ODS
| --to the top ↑-- |
|---|
| Job description by steps | Job design |
|---|
| The flow is used to call the job J300_GSC_STG_TO_ODS
|
|
|---|
| - Parallel loading of the ODS table by source type
- A parallel loading of ODS table is done by source type (BW, Google Sheet, Salesforce)
- Load in sequence all ODS tables of a source
- Given a source type the list of staging and ODS tables is hardcoded.
- ODS table is truncated
- Th standard job J001_STAGING_TO_ODS is called to load the ODS table
Given the fact that staging and ods are identical and tables are always truncated, this step is unnecessary. | |
|---|
4.2 - Prepare DM files
| Main jobs for building DM | F400_GSC_ODS_to_Prep_DM_FilesF400_PURCHASING_STEP- F410_GSC_ODS_to_DTM
- F420_GSC_ODS_to_DTM_Part_1
- F420_GSC_ODS_to_DTM_Part_5
- F420_GSC_ODS_to_DTM_Part_6
- F430_GSC_ODS_to_DTM
| --to the top ↑-- |
|---|
| Job description by steps | Job design |
|---|
| - This job is used to call a series of subjob that creates files needed to create the dimensions and fact tables
| |
|---|
| Prepare DM file | | --to the top ↑-- |
|---|
| Job description by steps | Job design |
|---|
| The job computes steps 19-28 of the specification document - The job reads data from the ODS table and apply the necessary mappings;
- The job output PIVOT and FACT files which are stored on the remote engine and will be uploaded to to the DM in the J500 jobs.
- The jobs creates the table FACT_gsc_target
| |
|---|
| Main jobs for source extraction | | --to the top ↑-- |
|---|
| Job description by steps | Job design |
|---|
| The job computes steps 41-49 of the specification document - The job reads data from the ODS table and apply the necessary mappings;
- The job output PIVOT and FACT files which are stored on the remote engine and will be uploaded to to the DM in the J500 jobs.
- The job also builds the files necessary to create the airfreight KPI following these specifications
| |
|---|
| Main jobs for source extraction | F420_GSC_ODS_to_DTM_PART_1
| --to the top ↑-- |
|---|
| Job description by steps | Job design |
|---|
| The job computes steps 33,34, and 35-49 of the specification document - The job reads data from the ODS table and apply the necessary mappings;
- The job output PIVOT and FACT files which are stored on the remote engine and will be uploaded to to the DM in the J500 jobs.
| |
|---|
| Main jobs for source extraction | F420_GSC_ODS_to_DTM_PART_5
| --to the top ↑-- |
|---|
| Job description by steps | Job design |
|---|
| The job computes steps 33,34, and 60-69 of the specification document - The job reads data from the ODS table and apply the necessary mappings;
- The job output PIVOT and FACT files which are stored on the remote engine and will be uploaded to to the DM in the J500 jobs.
The job uses the tUnpivotRow component (you can find it here). | |
|---|
4.3 - Compute Perfect Order Rate
| Main jobs for source extraction | F420_GSC_ODS_to_DTM_PART_6
| --to the top ↑-- |
|---|
| Job description by steps | Job design |
|---|
| The job creates the table FACT_POR which is displayed by the Perfect Order rate dashbaord. - The job reads data from the ODS table and apply the necessary mappings. The main ODS tables are:
- ODS_0000_F025_F_W_GSC_MVSDSO45_SD0001
- ODS_0000_F026_F_W_GSC_Mvsdso51_0001
- ODS_0000_F020_F_W_GSC_Sf_Complaint_Case
- Each step outputs a FACT file which is stored in the Tmp folder
- The files are stacked to create a unique output bulk file which is used to create the FACT_POR table.
| |
|---|
| Main jobs for source extraction | F501_GSC_Files_To_FACT- 501_GSC_ODS_to_DTM_FACT
| --to the top ↑-- |
|---|
| Job description by steps | Job design |
|---|
| This jobs is used to call the job J501_GSC_ODS_to_DTM_FACT | |
|---|
| 501_GSC_ODS_to_DTM_FACT | - First the job truncates the table DM.FACT_gsc_tmp
- The job reads all files whose name starts with FACT_FILE inside the InOut folder and uploads them into the FACT_gsc_tmp table. If files is successfully inserted the local file is deleted from the remote engine
- The job truncates the table DM.FACT_gsc
- The job copies data from FACT_gsc_tmp into FACT_gsc
- The job truncates the table DM.FACT_gsc_tmp
| |
|---|
| Main jobs for source extraction | F502_GSC_Files_To_PIVOT- J502_GSC_ODS_to_DTM_PIVOT
| --to the top ↑-- |
|---|
| Job description by steps | Job design |
|---|
| - This jobs is used to call the job J502_GSC_ODS_to_DTM_PIVOT
| |
|---|
- J502_GSC_ODS_to_DTM_PIVOT
| - First the job truncates the table DM.DIM_gsc_pivot_tmp
- The job gets data used to perform lookups
- The job executes the TALNED_PROD_QV_C_PLANT Xtract job
- Creates mapping for a list of sites
- Read data from DIM_subact2
- Creates a mapping for some old GBU codes
- The job reads all files whose name starts with PIVOT_ and TPIVOT_ inside the InOut folder and uploads them into the DIM_gsc_tmp table. If files is successfully inserted the local file is deleted from the remote engine. Some joins are done using the sources previously extracted
- The job truncates the table DM.DIM_gsc_pivot
- The job copies data from DIM_gsc_pivot_tmp into DIM_gsc_pivot
- The job truncates the table DM.DIM_gsc_pivot_tmp
| |
|---|
| Main jobs for source extraction | - F503_GSC_Files_To_OTIF
- J503_GSC_ODS_
| --to the top ↑-- |
|---|
| Job description by steps | Job design |
|---|
| - The flow is used to build the table DIM_gsc_otif
- The job reads data from the Step_60.csv file computed in the job and joins it with the fiels Step_{61..65}. These are the output of job J420_GSC_ODS_to_DTM_PART_5
- The combined file is used to overwrite the content of the table DIM_gsc_otif
| |
|---|
4.5 - Scheduling and Automation
TMC - PL_QV_TO_TABLEAU_GSC - Daily at 9 PM.
4.6 - Data Validation