4.1 - Talend Integration

Source data integration with Talend ETL tool

  • Xtract Server (WBP):

    • Talend connects to the Xtract server and trigger extraction job using HTTP calls.
    • Xtract generates results as CSV files stored in memory.
    • Talend retrieves files, apply some transformation and then loads them into Google Cloud Storage.
  • Google Sheets (Hedges, wap solid fuels, CO2):

Data Transformation and Loading to Google BigQuery:

  • Once data from all four sources (FTP server, PostgreSQL database, Oracle database, Google Sheets) is available in Google Cloud Storage as files,
    Talend proceeds with data transformation and loading.
  • Talend performs data transformations as needed, including cleansing, mapping, and structuring the data for consistency.
  • The transformed data is loaded into various stages, operational data stores (ODS), and data mart tables within Google BigQuery.
  • These tables are organized to facilitate efficient querying and reporting for energy optimization purposes.

By utilizing Talend for data extraction, transformation, and loading (ETL), the web app ensures that data from diverse sources is collected, processed, and structured for analysis and reporting within Google BigQuery, enabling users to make informed decisions based on up-to-date and accurate data.

4.2 - Source Data Extraction

Main jobs for source extraction
  • F100_PUR_DATA_EXTRACT

  •  J100_PURCHASING_EXTRACT_SOURCE_3

  •  J100_PURCHASING_EXTRACT_SOURCE_6

  •  J100_PURCHASING_EXTRACT_SOURCE_7

  •  J100_PURCHASING_EXTRACT_SOURCE_11

  •  J100_PURCHASING_EXTRACT_SOURCE_13

--to the top ↑--

Job description by stepsJob design

  1. In the job F100_PUR_DATA_EXTRACT:
    1. A parallel execution of all data sources is launched

We will analyze in detail on job but the logic in the other is simimar


  1. The jobs triggers the extract universal jobs. In some cases we trigger loop over a list of parameters and trigger multiple extractions
  2. We read the CSV file obtained by extract and we :
    1. Remove duplicates
    2. Apply mappings as indicated in the specifications and create a file ready to be uploaded to stagin
  3. We upload the file into the bucket.

Main jobs for loading Staging
  • F200_PUR_GCS_TO_STG
  • J200_PUR_GCS_TO_STG
--to the top ↑--

Job description by stepsJob design

  1. The flow is used to call the job J200_PUR_GCS_TO_STG:

  2. Looping over the files:

    1. The previous step generated a list of files in the Stg_file folder. The job iterates over them and for each file it calls the job J200_PUR_CSV_TO_STAGING_LOCAL.
      1. The tFixedFlowInput component contains the list of all files with the corresponding staging table name.


Main jobs for loading the ODS
  • F300_PUR_STG_TO_ODS

  • J300_PUR_STG_TO_ODS

--to the top ↑--

Job description by stepsJob design


  1. The flow is used to call the job J300_PUR_STG_TO_ODS:

  2. Looping over the list of staging tables:

    1. Truncate the corresponding ODS table
    2. Call the job J001_STAGING_TO_ODS to insert data in the table


Main jobs for building DM
  • F400_PUR_PREP_DATA

    • F400_PURCHASING_STEP
    • F410_PURCHASING_STEP
    • F420_PURCHASING_STEP
--to the top ↑--

Job description by stepsJob design

  1. This Talend flow first delete files that remain in the remote engine from previous executions
  2. It call in sequesce 3 subjobs which read the different ODS tables needed to build the dimension and pivot table.

Prepare DM file

F400_PURCHASING_STEP

--to the top ↑--

Job description by stepsJob design

  1. Reqd data from DATA_HR_TARGET:
    • The job reads data from the ODS table ODS_0000_F001_F_W_PUR_DATA_HR_TARGET and apply the necessqry mapping
  2. Calculation of rate cost:
    • The job reads this Googel Sheet is order to extact the various currency rate per year and saves them in memory.
  3. Compute Function cost target:
    • The job read data from the ODS table ODS_0000_F012_F_W_PUR_FUNCTION_COST_TARGET and apply mapping to build the FUNCTION_COST_TARGET

This job builds the first pieces of data necessary to build the FUNCTION cost target


Main jobs for source extraction
  • F410_PURCHASING_STEP

--to the top ↑--

Job description by stepsJob design

The job reads in sequence a list of ODS tables and apply the requested mappings

  1. ODS_0000_F010_F_W_PUR_CPHRPANHR_0001

    1. The table is read twice
      1. Reads ODS data applying filter Bus__Unit__Ent__Grp__Key='FPUR' and getting only the last two year of data.
      2. Reads ODS data applying filter SPLIT(Job_Function, '/')[OFFSET(0)] in ('PROC','SC ') and getting only the last two year of data.
  2. ODS_0000_F007_F_W_PUR_MPR_FC001_0001:

    1. We read the entire table and apply the requested mappings
  3. FUNCTION_COST_TARGET

    1. The file created in job J400_PURCHASING_STEP is parsed to adapt it to the dimension and fact schema
  4. ODS_0000_F009_F_W_PUR_WO_PRODUCT
    1. We read the table applying the following filters on Creditor_Class__P__Key not like ('%S%' OR '%X%' ) and getting only the last two year of data.
  5. ODS_0000_F008_F_W_PUR_MVGSVQM_0003:
    1. We read the table applying the following filters on excluding when Vendor_Domain_Key contains 9 and getting only the last two year of data.


This Talend job is the second step of building Dimension and Fact table. All data in FACT tables are also assigned to ECO or SCO.

Main jobs for source extraction
  • F420_PURCHASING_STEP

--to the top ↑--

Job description by stepsJob design

The job reads in sequence a list of ODS tables and apply the requested mappings

  1. ODS_0000_F011_F_W_PUR_MVPUPE01_001

    1. We read the table applying the following filters on  VC_PROGRAM = 'Procurement'  and getting only the last two year of data.
  2. ODS_0000_F005_F_W_PUR_Strategy_Target
    1. We read the table applying the following filters for getting only the last two year of data. and where TARGET_EUR is not empty.
  3. At the end of step there is a condition that verifies if the current date is the last day of the month , if true it appends the content of the table ODS_0000_F013_F_W_PUR_MVPUPE01_001 to the table ODS_0000_F013_F_W_PUR_MVPUPE01_001_MMYYYY 




4.4 - Load to DM (calculations and transformations)


Main jobs for source extraction
  • F500_PUR_DATA_TO_DTM

--to the top ↑--

Job description by stepsJob design

This jobs reads the files stored in remote engine and computed in the previous and use them to populate dimensions and fact tables.

Main jobs for source extraction
  • J500_PURCHASING_PIVOT_TO_DTM

--to the top ↑--

Job description by stepsJob design

Metadata Calculation:

  1. Job Metadata Calculation:
    • The job initiates by calculating essential job metadata. This metadata include information like job identifiers, timestamps, or job-specific configuration details.
    • This step sets the foundation for the subsequent data processing.

Slowly Changing Dimension (SCD2) Data Loading:

  1. SCD2 Logic for Dimension Data Loading:
    • In this step, the job applies Slowly Changing Dimension Type 2 (SCD2) logic to load data into a dimension table.
    • The source of the data is the Operational Data Store (ODS) energy price metadata table, which is a union of tables named "forwards" and "spot."
      The data is loaded into the Data Mart (DM) energy price table using the SCD2 methodology.
    • SCD2 logic involves comparing specific fields, such as name, commodity, contract, proprietary, timespan, rolling, and timezone, to identify changes and maintain historical records in the DM table.

Logging and Cleanup:

  1. Log Generation:
    • Upon the completion of the data loading process, the job generates logs. These logs serve as a record of the job's activities and provide visibility into the data loading process.
  2. Temporary File Deletion:
    • As a final step, the job takes care of deleting any temporary files created during the data transfer process. This cleanup ensures the efficient management of storage resources.

Main jobs for source extraction
  • J004_ODS_TO_DM_FACT_METEO_DATA

--to the top ↑--

Job description by stepsJob design

Metadata Calculation:

  1. Job Metadata Calculation:
    • The job initiates by calculating essential job metadata. This metadata include information like job identifiers, timestamps, or job-specific configuration details.
    • This step establishes the foundation for the subsequent data processing.

Data Enrichment and Loading:

  1. Data Enrichment from DIM_meteo_info:
    • In this step, the job enriches data from the Vendohm meteo table with attributes obtained from the DIM_meteo_info table.
  2. Loading into FACT_meteo_data:
    • The enriched data is then loaded into the FACT_meteo_data table.
    • This table serves as a fact table containing data relevant to meteorological information.

Logging and Cleanup:

  1. Log Generation:
    • Upon the completion of the data enrichment and loading process, the job generates logs. These logs serve as a record of the job's activities and provide visibility into the data processing steps.
  2. Temporary File Deletion:
    • As a final step, the job takes care of deleting any temporary files created during the data transfer and enrichment process. This cleanup ensures efficient storage resource management.

Main jobs for source extraction
  • J005_ODS_TO_DM_FACT_ENERGY_PRICE

--to the top ↑--

Job description by stepsJob design

Metadata Calculation:

  1. Job Metadata Calculation:
    • The job commences with the calculation of essential job metadata. This metadata include information like job identifiers, timestamps, or specific configuration details.
    • This initial step provides a foundation for subsequent data processing.

Data Enrichment and Loading:

  1. Data Enrichment from DIM_energy_price_info:
    • In this step, the job enriches data from the energy price table by incorporating attributes sourced from the DIM_energy_price_info table.
  2. Loading into FACT_energy_price_data:
    • The enriched data is subsequently loaded into the FACT_energy_price_data table.
    • The FACT_energy_price_data table is designed to store data relevant to energy prices.

Logging and Cleanup:

  1. Log Generation:
    • Following the completion of the data enrichment and loading process, the job generates logs. These logs serve as a record of the job's activities and provide visibility into the data processing steps.
  2. Temporary File Deletion:
    • As a final step, the job manages the deletion of any temporary files that were created during the data transfer and enrichment process. This cleanup ensures efficient storage resource management.






4.5 - Scheduling and Automation

TMC - PL_QV_TO_TABLEAU_PURCHASING - Daily at 5 AM.

4.6 - Data Validation


Responsible & contact points:

  • Alessandro Mainardi - Project Owner
  • Simon Bourguignon - Delivery Manager
  • Alba Carrero/ Gaetan Frenoy - Product Owner
  • Rui Ferraz - Project Manager