View Source

4.1 - Talend Integration

Source data integration with Talend ETL tool

Xtract Server (WBP):
- Talend connects to the Xtract server and trigger extraction job using HTTP calls.
- Xtract generates results as CSV files stored in memory.
- Talend retrieves files, apply some transformation and then loads them into Google Cloud Storage.
Google Sheets (Hedges, wap solid fuels, CO2):
- Talend integrates with Google Sheets, where solid fuel wap, CO2 emissions, and hedging information are stored.
- It retrieves this data from Google Sheets.
- Similar to the other sources, this data is also loaded into Google Cloud Storage as files.

Data Transformation and Loading to Google BigQuery:

Once data from all four sources (FTP server, PostgreSQL database, Oracle database, Google Sheets) is available in Google Cloud Storage as files,
Talend proceeds with data transformation and loading.
Talend performs data transformations as needed, including cleansing, mapping, and structuring the data for consistency.
The transformed data is loaded into various stages, operational data stores (ODS), and data mart tables within Google BigQuery.
These tables are organized to facilitate efficient querying and reporting for energy optimization purposes.

By utilizing Talend for data extraction, transformation, and loading (ETL), the web app ensures that data from diverse sources is collected, processed, and structured for analysis and reporting within Google BigQuery, enabling users to make informed decisions based on up-to-date and accurate data.

4.2 - Source Data Extraction

Main jobs for source extraction

F100_PUR_DATA_EXTRACT
J100_PURCHASING_EXTRACT_SOURCE_3
J100_PURCHASING_EXTRACT_SOURCE_6
J100_PURCHASING_EXTRACT_SOURCE_7
J100_PURCHASING_EXTRACT_SOURCE_11
J100_PURCHASING_EXTRACT_SOURCE_13

--to the top ↑--

Job description by steps

Job design

In the job F100_PUR_DATA_EXTRACT:
1. A parallel execution of all data sources is launched

We will analyze in detail on job but the logic in the other is simimar

Data Solutions - BD&A Team > Purchasing Dashboard 4. Data Loading with Talend > image-2024-1-19_17-18-2.png

The jobs triggers the extract universal jobs. In some cases we trigger loop over a list of parameters and trigger multiple extractions
We read the CSV file obtained by extract and we :
1. Remove duplicates
2. Apply mappings as indicated in the specifications and create a file ready to be uploaded to stagin
We upload the file into the bucket.

Data Solutions - BD&A Team > Purchasing Dashboard 4. Data Loading with Talend > image-2024-1-19_17-26-25.png

Main jobs for loading Staging

F200_PUR_GCS_TO_STG
J200_PUR_GCS_TO_STG

--to the top ↑--

Job description by steps

Job design

The flow is used to call the job J200_PUR_GCS_TO_STG:
Looping over the files:
1. The previous step generated a list of files in the Stg_file folder. The job iterates over them and for each file it calls the job J200_PUR_CSV_TO_STAGING_LOCAL.
  1. The tFixedFlowInput component contains the list of all files with the corresponding staging table name.

Data Solutions - BD&A Team > Purchasing Dashboard 4. Data Loading with Talend > image-2024-1-19_17-37-45.png

	Job description by steps	Job design
Main jobs for loading the ODS	F300_PUR_STG_TO_ODS J300_PUR_STG_TO_ODS	--to the top ↑--
	The flow is used to call the job J300_PUR_STG_TO_ODS: Looping over the list of staging tables: Truncate the corresponding ODS table Call the job J001_STAGING_TO_ODS to insert data in the table

	Job description by steps	Job design
Main jobs for building DM	F400_PUR_PREP_DATA F400_PURCHASING_STEP F410_PURCHASING_STEP F420_PURCHASING_STEP	--to the top ↑--
	This Talend flow first delete files that remain in the remote engine from previous executions It call in sequesce 3 subjobs which read the different ODS tables needed to build the dimension and pivot table.

Prepare DM file

F400_PURCHASING_STEP

--to the top ↑--

Job description by steps

Job design

Reqd data from DATA_HR_TARGET:
- The job reads data from the ODS table ODS_0000_F001_F_W_PUR_DATA_HR_TARGET and apply the necessqry mapping
Calculation of rate cost:
- The job reads this Googel Sheet is order to extact the various currency rate per year and saves them in memory.
Compute Function cost target:
- The job read data from the ODS table ODS_0000_F012_F_W_PUR_FUNCTION_COST_TARGET and apply mapping to build the FUNCTION_COST_TARGET

This job builds the first pieces of data necessary to build the FUNCTION cost target

Main jobs for source extraction

F410_PURCHASING_STEP

--to the top ↑--

Job description by steps

Job design

The job reads in sequence a list of ODS tables and apply the requested mappings

ODS_0000_F010_F_W_PUR_CPHRPANHR_0001
1. The table is read twice
  1. Reads ODS data applying filter Bus__Unit__Ent__Grp__Key='FPUR' and getting only the last two year of data.
  2. Reads ODS data applying filter SPLIT(Job_Function, '/')[OFFSET(0)] in ('PROC','SC ') and getting only the last two year of data.
ODS_0000_F007_F_W_PUR_MPR_FC001_0001:
1. We read the entire table and apply the requested mappings
FUNCTION_COST_TARGET
1. The file created in job J400_PURCHASING_STEP is parsed to adapt it to the dimension and fact schema
ODS_0000_F009_F_W_PUR_WO_PRODUCT
1. We read the table applying the following filters on Creditor_Class__P__Key not like ('%S%' OR '%X%' ) and getting only the last two year of data.
ODS_0000_F008_F_W_PUR_MVGSVQM_0003:
1. We read the table applying the following filters on excluding when Vendor_Domain_Key contains 9 and getting only the last two year of data.

This Talend job is the second step of building Dimension and Fact table. All data in FACT tables are also assigned to ECO or SCO.

Data Solutions - BD&A Team > Purchasing Dashboard 4. Data Loading with Talend > image-2024-1-19_18-53-56.png

Main jobs for source extraction

F420_PURCHASING_STEP

--to the top ↑--

Job description by steps

Job design

The job reads in sequence a list of ODS tables and apply the requested mappings

ODS_0000_F011_F_W_PUR_MVPUPE01_001
1. We read the table applying the following filters on VC_PROGRAM = 'Procurement' and getting only the last two year of data.
ODS_0000_F005_F_W_PUR_Strategy_Target
1. We read the table applying the following filters for getting only the last two year of data. and where TARGET_EUR is not empty.
At the end of step there is a condition that verifies if the current date is the last day of the month , if true it appends the content of the table ODS_0000_F013_F_W_PUR_MVPUPE01_001 to the table ODS_0000_F013_F_W_PUR_MVPUPE01_001_MMYYYY

Data Solutions - BD&A Team > Purchasing Dashboard 4. Data Loading with Talend > image-2024-1-19_19-0-30.png

Data Solutions - BD&A Team > Purchasing Dashboard 4. Data Loading with Talend > image-2024-1-19_18-59-55.png

4.4 - Load to DM (calculations and transformations)

	Job description by steps	Job design
Main jobs for source extraction	F500_PUR_DATA_TO_DTM	--to the top ↑--
	This jobs reads the files stored in remote engine and computed in the previous and use them to populate dimensions and fact tables.

Main jobs for source extraction

J500_PURCHASING_PIVOT_TO_DTM

--to the top ↑--

Job description by steps

Job design

Metadata Calculation:

Job Metadata Calculation:
- The job initiates by calculating essential job metadata. This metadata include information like job identifiers, timestamps, or job-specific configuration details.
- This step sets the foundation for the subsequent data processing.

Slowly Changing Dimension (SCD2) Data Loading:

SCD2 Logic for Dimension Data Loading:
- In this step, the job applies Slowly Changing Dimension Type 2 (SCD2) logic to load data into a dimension table.
- The source of the data is the Operational Data Store (ODS) energy price metadata table, which is a union of tables named "forwards" and "spot."
  The data is loaded into the Data Mart (DM) energy price table using the SCD2 methodology.
- SCD2 logic involves comparing specific fields, such as name, commodity, contract, proprietary, timespan, rolling, and timezone, to identify changes and maintain historical records in the DM table.

Logging and Cleanup:

Log Generation:
- Upon the completion of the data loading process, the job generates logs. These logs serve as a record of the job's activities and provide visibility into the data loading process.
Temporary File Deletion:
- As a final step, the job takes care of deleting any temporary files created during the data transfer process. This cleanup ensures the efficient management of storage resources.

Data Solutions - BD&A Team > Purchasing Dashboard 4. Data Loading with Talend > image-2024-1-19_19-6-3.png

Main jobs for source extraction

J004_ODS_TO_DM_FACT_METEO_DATA

--to the top ↑--

Job description by steps

Job design

Metadata Calculation:

Job Metadata Calculation:
- The job initiates by calculating essential job metadata. This metadata include information like job identifiers, timestamps, or job-specific configuration details.
- This step establishes the foundation for the subsequent data processing.

Data Enrichment and Loading:

Data Enrichment from DIM_meteo_info:
- In this step, the job enriches data from the Vendohm meteo table with attributes obtained from the DIM_meteo_info table.
Loading into FACT_meteo_data:
- The enriched data is then loaded into the FACT_meteo_data table.
- This table serves as a fact table containing data relevant to meteorological information.

Logging and Cleanup:

Log Generation:
- Upon the completion of the data enrichment and loading process, the job generates logs. These logs serve as a record of the job's activities and provide visibility into the data processing steps.
Temporary File Deletion:
- As a final step, the job takes care of deleting any temporary files created during the data transfer and enrichment process. This cleanup ensures efficient storage resource management.

Data Solutions - BD&A Team > Purchasing Dashboard 4. Data Loading with Talend > image2023-10-15_22-16-8.png

Main jobs for source extraction

J005_ODS_TO_DM_FACT_ENERGY_PRICE

--to the top ↑--

Job description by steps

Job design

Metadata Calculation:

Job Metadata Calculation:
- The job commences with the calculation of essential job metadata. This metadata include information like job identifiers, timestamps, or specific configuration details.
- This initial step provides a foundation for subsequent data processing.

Data Enrichment and Loading:

Data Enrichment from DIM_energy_price_info:
- In this step, the job enriches data from the energy price table by incorporating attributes sourced from the DIM_energy_price_info table.
Loading into FACT_energy_price_data:
- The enriched data is subsequently loaded into the FACT_energy_price_data table.
- The FACT_energy_price_data table is designed to store data relevant to energy prices.

Logging and Cleanup:

Log Generation:
- Following the completion of the data enrichment and loading process, the job generates logs. These logs serve as a record of the job's activities and provide visibility into the data processing steps.
Temporary File Deletion:
- As a final step, the job manages the deletion of any temporary files that were created during the data transfer and enrichment process. This cleanup ensures efficient storage resource management.

Data Solutions - BD&A Team > Purchasing Dashboard 4. Data Loading with Talend > image2023-10-15_22-16-24.png

4.5 - Scheduling and Automation

TMC - PL_QV_TO_TABLEAU_PURCHASING - Daily at 5 AM.

4.6 - Data Validation

Responsible & contact points:

Alessandro Mainardi - Project Owner
Simon Bourguignon - Delivery Manager
Alba Carrero/ Gaetan Frenoy - Product Owner
Rui Ferraz - Project Manager

4.1 - Talend Integration

4.2 - Source Data Extraction

F100_PUR_DATA_EXTRACT

J100_PURCHASING_EXTRACT_SOURCE_3

J100_PURCHASING_EXTRACT_SOURCE_6

J100_PURCHASING_EXTRACT_SOURCE_7

J100_PURCHASING_EXTRACT_SOURCE_11

J100_PURCHASING_EXTRACT_SOURCE_13

F300_PUR_STG_TO_ODS

J300_PUR_STG_TO_ODS

F400_PUR_PREP_DATA

F410_PURCHASING_STEP

F420_PURCHASING_STEP

4.4 - Load to DM (calculations and transformations)

F500_PUR_DATA_TO_DTM

J500_PURCHASING_PIVOT_TO_DTM

J004_ODS_TO_DM_FACT_METEO_DATA

J005_ODS_TO_DM_FACT_ENERGY_PRICE

4.5 - Scheduling and Automation

4.6 - Data Validation