You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

4.1 - Talend Integration

Source data integration with Talend ETL tool

  • FTP Server (Meteologica):

    • Talend connects to the FTP server where files containing current and future electricity prices for Spain, Italy, France, and Germany are located.
    • Talend retrieves these files from the FTP server.
    • The retrieved files are then loaded into Google Cloud Storage.
  • Postgres Database (Vendohm):

    • Talend establishes a direct connection with the Vendohm PostgreSQL database that stores sensor values identified by specific curve IDs.
    • It extracts relevant data from the PostgreSQL database.
    • This extracted data is loaded into Google Cloud Storage as files.
  • Oracle Database (IRM):
    • Talend establishes a direct connection with the IRM Oracle database that stores energy deals identified by specific deal IDs.
    • It extracts relevant data from the Oracle database.
    • This extracted data is loaded into Google Cloud Storage as files.
  • Google Sheets (Hedges, wap solid fuels, CO2):

Data Transformation and Loading to Google BigQuery:

  • Once data from all four sources (FTP server, PostgreSQL database, Oracle database, Google Sheets) is available in Google Cloud Storage as files,
    Talend proceeds with data transformation and loading.
  • Talend performs data transformations as needed, including cleansing, mapping, and structuring the data for consistency.
  • The transformed data is loaded into various stages, operational data stores (ODS), and data mart tables within Google BigQuery.
  • These tables are organized to facilitate efficient querying and reporting for energy optimization purposes.

By utilizing Talend for data extraction, transformation, and loading (ETL), the web app ensures that data from diverse sources is collected, processed, and structured for analysis and reporting within Google BigQuery, enabling users to make informed decisions based on up-to-date and accurate data.

4.2 - Source Data Extraction

Main jobs for source extraction
  • F100_PUR_DATA_EXTRACT

  •  J100_PURCHASING_EXTRACT_SOURCE_3

  •  J100_PURCHASING_EXTRACT_SOURCE_6

  •  J100_PURCHASING_EXTRACT_SOURCE_7

  •  J100_PURCHASING_EXTRACT_SOURCE_11

  •  J100_PURCHASING_EXTRACT_SOURCE_13

--to the top ↑--

Job description by stepsJob design

  1. In the job F100_PUR_DATA_EXTRACT:
    1. A parallel execution of all data sources is launched

We will analyze in detail on job but the logic in the other is simimar


  1. The jobs triggers the extract universal jobs. In some cases we trigger loop over a list of parameters and trigger multiple extractions
  2. We read the CSV file obtained by extract and we :
    1. Remove duplicates
    2. Apply mappings as indicated in the specifications and create a file ready to be uploaded to stagin
  3. We upload the file into the bucket.

Main jobs for source extraction
  • J008_GSheet_to_GCS_WAP_SOLID_FUELS
  • J011_GSheet_to_GCS_VENDOHM_FORWARDS

  • J012_GSheet_to_GCS_VENDOHM_SPOT

  • J013_GSheet_to_GCS_VENDOHM_METEO

  • J015_GSheet_to_GCS_IRM_HUBS

  • J016_GSheet_to_GCS_IRM_COUNTER_PARTIES

  • J017_GSheet_to_GCS_IRM_SITES_HEDGES

--to the top ↑--

Job description by stepsJob design

  1. Google Sheets Connection:

    1. The job initiates a connection to a specified Google Sheets file.
    2. It targets a specific sheet within the Google Sheets document, identified by its unique ID.
  2. Data Extraction and Formatting:

    1. The job extracts data from the designated Google Sheets sheet.
    2. The extracted data is converted into a CSV format with a predefined fixed schema.
    3. The schema format is consistent and predefined to ensure data uniformity and structure.
    4. The resulting CSV file includes a predefined file name that incorporates the date of extraction.
  3. Copying to Google Storage:

    1. After the successful extraction and formatting of the data, the job copies the CSV file to Google Cloud Storage.
    2. This step serves as a secure and reliable means of storing the processed data in the cloud.
  4. File Deletion:

    1. Following the successful copying of the CSV file to Google Cloud Storage, the original file located on the remote engine is deleted.
    2. This cleanup step helps manage storage resources efficiently and maintains data integrity.

This Talend job efficiently extracts, formats, and securely stores data from a Google Sheets document into Google Cloud Storage, ensuring that the data is readily available for further analysis and processing while adhering to a fixed schema and naming conventions.

Main jobs for source extraction
  • J009_GSheet_to_GCS_CO2_EMISSIONS

--to the top ↑--

Job description by stepsJob design


  1. Google Sheets Connection:

    1. The job establishes a connection to a specified Google Sheets file.
    2. It focuses on a particular sheet within the Google Sheets document, identified by its unique ID.
  2. Data Extraction and Formatting:

    1. The job extracts data from the designated Google Sheets sheet.
    2. The extracted data is transformed into a CSV format, adhering to a predefined, fixed schema.
    3. The schema format is consistent and predefined to ensure uniformity and data structure.
    4. The resulting CSV file follows a predefined naming convention, which incorporates the date of extraction.
  3. Data Reading:

    1. The job involves two distinct data reading steps.
      1. First Reading: The job reads only a single cell within the file, specifically the cell containing the year value. This is set to capture year-related data.
      2. Second Reading: The second read of the file is reserved for capturing the remaining data, defined by the schema and format. This step comprehensively reads the data specified in the schema.
  4. Copying to Google Storage:

    1. After successfully extracting and formatting the data, the job copies the CSV file to Google Cloud Storage (inside Sustainability domain).
    2. This step serves as a secure and reliable means of storing the processed data in the cloud.
  5. File Deletion:

    1. Following the successful copying of the CSV file to Google Cloud Storage, the original file located on the remote engine is deleted.
    2. This cleanup step helps manage storage resources efficiently and maintains data integrity.

This Talend job effectively extracts, formats, and securely stores data from a Google Sheets document into Google Cloud Storage. It features specialized data reading steps, ensuring the capture of year-related data separately, while adhering to predefined schemas and naming conventions. The deletion of the original file enhances data management and resource optimization.

Main jobs for source extraction
  • J010_GSheet_to_GCS_VENDOHM

--to the top ↑--

Job description by stepsJob design

  1. PostgreSQL Database Connection:

    1. The job establishes a secure connection to a PostgreSQL database.
    2. This connection is secured using certificates and a client key provided in the remote engine.
  2. Data Extraction from PostgreSQL:

    1. The job retrieves new data (increment by id) from a specified table in the PostgreSQL database.
    2. The schema for this data extraction process is predefined and consistent.
  3. Data Formatting and CSV Output:

    1. The extracted data is formatted and transformed into CSV format.
    2. The resulting CSV file is named according to a predefined convention, which includes the table name and the date of extraction.
    3. This structured naming ensures clarity and organization.
  4. Storage in Google Cloud Storage (GCS):

    1. The formatted CSV file is securely stored in Google Cloud Storage.
    2. It is placed within a designated "Vendohm" folder in the storage location.
  5. File Deletion and Connection Closure:

    1. After a successful data extraction and storage process, the original file located on the remote engine is deleted.
    2. Additionally, the connection to the PostgreSQL database is closed, ensuring proper resource management.

This Talend job effectively connects to a PostgreSQL database, retrieves data from a specified table, and securely stores the formatted data in CSV format within Google Cloud Storage. It employs robust security measures and follows strict naming conventions for data organization, with an emphasis on data integrity and resource optimization.

Main jobs for source extraction
  • J014_Oracle_to_GCS_IRM

--to the top ↑--

Job description by stepsJob design

  1. Oracle Database Connection:
    • The job initiates a secure connection to an Oracle database, specifically to an IRM (Information Resource Management) database.
    • This connection ensures access to the relevant data.
  2. Data Extraction via Custom Query:
    • The job extracts data from the Oracle database using a custom SQL query provided as input (second picture).
    • The data is structured according to a predefined and fixed schema to maintain data consistency.
  3. Data Formatting and CSV Output:
    • The extracted data is formatted and transformed into CSV format.
    • Each resulting CSV file includes the table name and the date of extraction for reference and organization.
  4. Storage in Google Cloud Storage (GCS):
    • After the successful extraction and formatting of the data, the job securely stores the CSV file in Google Cloud Storage.
    • The files are placed within a designated "IRM" folder in the GCS location for efficient storage and accessibility.
  5. File Deletion and Connection Closure:
    • Following a successful data extraction and storage process, the original file located on the remote engine is deleted to manage storage resources efficiently.
    • The connection to the Oracle database is closed to ensure proper resource management and security.

This job supports data extraction and reporting processes within an IRM database, ensuring that relevant data is accessible, well-organized, and securely stored for future use.

4.3 - Data Loading to Google BigQuery

Main jobs for source extraction
  • J001_Extraction_til_ODS_METEOLOGICA_FRANCE

  • J002_Extraction_til_ODS_METEOLOGICA_ITALY

  • J003_Extraction_til_ODS_METEOLOGICA_GERMANY

  • J004_Extraction_til_ODS_METEOLOGICA_SPAIN

  • J005_Extraction_til_ODS_METEOLOGICA_SPAIN_ENS

  • J006_Extraction_til_ODS_METEOLOGICA_SPAIN_OBS

--to the top ↑--

Job description by stepsJob design

Job Initialization and Logging:

  1. Job Run Initialization (Subjob 1):
    1. The job starts with the first subjob, where the job ID is retrieved, and the extraction date is calculated.
    2. This crucial step initializes the job and captures essential metadata.

Data Extraction and Processing:

  1. Files Extraction from FTP Server (Subjob 2):

    1. In the next subjob, data files are extracted from the Meteologica FTP server.
  2. Error Handling and Logging (Subjob 3 - On Failure):

    1. If the data extraction subjob (Subjob 2) encounters an issue or does not finish successfully, the job transitions to a subjob for writing error logs.
    2. Logging is essential for tracking and diagnosing issues in the data extraction process.
  3. Data Loading and Transformation (Subjob 4 - On Success):

    1. When the data extraction subjob (Subjob 2) successfully completes, the job proceeds with data loading and transformation.
    2. In this stage, each extracted file is processed one by one.
    3. For each file, a subjob is called to load data from the CSV file into a stage table, preparing it for further processing.
    4. Subsequently, data from the stage table is loaded into an operational data store (ODS) table.
    5. All necessary parameters, such as table names and connection parameters, are provided to ensure accurate data extraction and loading.

Logging and Reporting:

  1. Logging (Subjob 5):
    1. At the end of the data loading and transformation process, a subjob is called to write logs.
    2. Logging captures critical information about the data processing, ensuring transparency and traceability.

This Talend job orchestrates the data extraction, loading, and transformation process. It begins with initialization and metadata capture, proceeds to extract data from the FTP server, handles errors if they occur, and performs data loading and transformation for energy optimization purposes. Detailed logging and error handling mechanisms enhance job monitoring and maintain data integrity.

Main jobs for source extraction
  • J008_Extraction_til_ODS_GSHEET_WAP_SOLID_FUELS
  • J009_Extraction_til_ODS_GSHEET_CO2

  • J011_Extraction_till_ODS_GSHEET_VENDOHM_FORWARDS

  • J012_Extraction_till_ODS_GSHEET_VENDOHM_SPOT

  • J013_Extraction_till_ODS_GSHEET_VENDOHM_METEO

  • J014_Extraction_till_ODS_ORACLE_IRM

  • J015_Extraction_till_ODS_GSHEET_IRM_HUBS

  • J016_Extraction_till_ODS_GSHEET_IRM_COUNTER_PARTIES

  • J017_Extraction_till_ODS_GSHEET_IRM_SITES_HEDGES

--to the top ↑--

Job description by stepsJob design

Job Initialization and Logging:

  1. File Name Configuration (Global Variable):

    1. The job begins by configuring the file name as a global variable.
    2. This variable is used throughout the job to ensure consistency in file handling.
  2. Job Run Initialization (Subjob 1):

    1. In the first subjob, the job ID is retrieved, and the extraction date is calculated.
    2. This step sets the stage for the data processing and captures essential metadata.

Data Extraction from Google Sheets / IRM database:

  1. Files Extraction from Google Sheets / IRM database (Subjob 2):

    1. The next subjob is responsible for extracting data from Google Sheets/ IRM database.
    2. It targets the specified file based on the global variable (file name) configured earlier.
  2. Error Handling and Logging (Subjob 3 - On Failure):

    1. In case the data extraction subjob (Subjob 2) encounters any issues or doesn't finish successfully, the job transitions to a subjob for writing error logs.
    2. Logging is crucial for tracking and diagnosing problems during the data extraction process.

Data Loading and Transformation:

  1. Data Loading and Transformation (Subjob 4 - On Success):
    1. When the data extraction subjob (Subjob 2) successfully completes for the provided file name, the job proceeds with data loading and transformation.
    2. A subjob is called to load data from the CSV file into a stage table, preparing it for further processing.
    3. Subsequently, data from the stage table is loaded into an operational data store (ODS) table.
    4. All necessary parameters, such as table names and connection parameters, are provided to ensure accurate data extraction and loading.

Logging and Reporting:

  1. Logging (Subjob 5):
    1. At the end of the data loading and transformation process, a subjob is called to write logs.
    2. Logging captures crucial information about the data processing, enhancing transparency and traceability.

This Talend job streamlines the data extraction, loading, and transformation process. It begins with file name configuration, proceeds with metadata capture, extracts data from Google Sheets, handles errors if they occur, and performs data loading and transformation, all while maintaining detailed logs for monitoring and maintaining data integrity.

Main jobs for source extraction
  • J010_Extraction_til_ODS_VENDOHM

--to the top ↑--

Job description by stepsJob design

Global Variable Configuration:

  1. File Names and Table Names Configuration (Global Variables):
    1. The job starts by configuring global variables for file names and table names.
    2. These variables are populated by reading a list of all tables and files that subjobs need to load.
    3. This dynamic configuration ensures that the job adapts to the specific files and tables required for processing.

Data Extraction and Processing in Iterations:

  1. Iteration Loop (One by One):

    1. The job processes tables and files iteratively, one by one, based on the configured global variables.
  2. Job Run Initialization (Subjob 1):

    1. Within each iteration, the first subjob is called to retrieve the job ID and calculate the extraction date.
    2. This initialization step captures vital metadata for the ongoing data processing.
  3. Files Extraction from PostgreSQL Database (Subjob 2):

    1. In the next subjob, data files are extracted from the PostgreSQL database.
    2. Each iteration targets a specific file based on the global variables (file names) configured earlier.
  4. Error Handling and Logging (Subjob 3 - On Failure):

    1. If the data extraction subjob (Subjob 2) encounters any issues or doesn't finish successfully, the job transitions to a subjob for writing error logs.
    2. Logging is crucial for tracking and diagnosing problems during the data extraction process.
  5. Data Loading and Transformation (Subjob 4 - On Success):

    1. When the data extraction subjob (Subjob 2) successfully completes for the provided file name, the job proceeds with data loading and transformation.
    2. A subjob is called to load data from the CSV file into a stage table, preparing it for further processing.
    3. Subsequently, data from the stage table is loaded into an operational data store (ODS) table.
    4. All necessary parameters, such as table names and connection parameters, are provided to ensure accurate data extraction and loading.

Logging and Reporting:

  1. Logging (Subjob 5):
    1. At the end of each iteration, a subjob is called to write logs.
    2. Logging captures critical information about the data processing, ensuring transparency and traceability for each processed file and table.

This Talend job is designed for dynamic data extraction and processing, adapting to different tables and files. It initializes each iteration with metadata capture, extracts data from the PostgreSQL database, handles errors when necessary, performs data loading and transformation with parameterized configurations, and maintains detailed logs for monitoring and data integrity across multiple files and tables.


4.4 - Load to DM (calculations and transformations)

Main jobs for source extraction
  • J001_ODS_TO_DM_FACT_ENERGY_PRICE_FORECAST

--to the top ↑--

Job description by stepsJob design

Metadata Calculation and Cache Initialization:

  1. Job Metadata Calculation:
    • The job begins by calculating essential job metadata, which include information such as job identifiers, timestamps, or configuration details.
    • This step sets the stage for the subsequent data processing.
  2. Fact Table Keys Calculation and Caching:
    • Following metadata calculation, the job proceeds to compute and store keys associated with the current Fact table.
    • These keys are cached for efficient reference during data processing.

Data Extraction and Import:

  1. Energy Forecast Data Extraction:
    • The job extracts energy forecast data from four Operational Data Store (ODS) tables, each containing information for a specific region (Italy, Germany, Spain, and France).
  2. Data Comparison and Import:
    • The extracted energy forecast data is compared with the cached keys from the Fact table.
    • The primary purpose of this step is to identify and import only new rows of data for the latest forecasted date.
    • By importing only new data, the job ensures that the dataset remains up to date and avoids unnecessary duplication.

Logging and Cleanup:

  1. Log Generation:
    • At the conclusion of data import and processing, the job generates logs.
    • These logs provide a record of the job's activities, facilitating monitoring and troubleshooting.
  2. Temporary File Deletion:
    • As a final step, the job deletes any temporary files created during the data transfer process.
    • This cleanup helps manage storage resources and maintain data integrity.

Main jobs for source extraction
  • J002_ODS_TO_DM_DIM_METEO_INFO

--to the top ↑--

Job description by stepsJob design

Metadata Calculation:

  1. Job Metadata Calculation:
    • The job initiates by calculating essential job metadata. This metadata include information like job identifiers, timestamps, or job-specific configuration details.
    • This step sets the foundation for the subsequent data processing.

Slowly Changing Dimension (SCD2) Data Loading:

  1. SCD2 Logic for Dimension Data Loading:
    • In this step, the job implements Slowly Changing Dimension Type 2 (SCD2) logic.
    • The purpose of this logic is to load data into a dimension table, specifically from the Operational Data Store (ODS) meteo table into the Data Mart (DM) meteo table.
    • The SCD2 logic is applied by comparing multiple fields across the two tables, such as name, database name, lc group, asset class, asset sub-class, asset type, capacity unit, commodity, curve publishing, curve type, hub, market, object type, owner, region, unit, and volume unit.

Logging and Cleanup:

  1. Log Generation:
    • Upon completion of the data loading process, the job generates logs. These logs serve as a record of the job's activities, providing visibility into the data loading process.
  2. Temporary File Deletion:
    • As a final step, the job takes care of deleting any temporary files created during the data transfer process. This cleanup ensures the efficient management of storage resources.

Main jobs for source extraction
  • J003_ODS_TO_DM_DIM_ENERGY_PRICE_INFO

--to the top ↑--

Job description by stepsJob design

Metadata Calculation:

  1. Job Metadata Calculation:
    • The job initiates by calculating essential job metadata. This metadata include information like job identifiers, timestamps, or job-specific configuration details.
    • This step sets the foundation for the subsequent data processing.

Slowly Changing Dimension (SCD2) Data Loading:

  1. SCD2 Logic for Dimension Data Loading:
    • In this step, the job applies Slowly Changing Dimension Type 2 (SCD2) logic to load data into a dimension table.
    • The source of the data is the Operational Data Store (ODS) energy price metadata table, which is a union of tables named "forwards" and "spot."
      The data is loaded into the Data Mart (DM) energy price table using the SCD2 methodology.
    • SCD2 logic involves comparing specific fields, such as name, commodity, contract, proprietary, timespan, rolling, and timezone, to identify changes and maintain historical records in the DM table.

Logging and Cleanup:

  1. Log Generation:
    • Upon the completion of the data loading process, the job generates logs. These logs serve as a record of the job's activities and provide visibility into the data loading process.
  2. Temporary File Deletion:
    • As a final step, the job takes care of deleting any temporary files created during the data transfer process. This cleanup ensures the efficient management of storage resources.

Main jobs for source extraction
  • J004_ODS_TO_DM_FACT_METEO_DATA

--to the top ↑--

Job description by stepsJob design

Metadata Calculation:

  1. Job Metadata Calculation:
    • The job initiates by calculating essential job metadata. This metadata include information like job identifiers, timestamps, or job-specific configuration details.
    • This step establishes the foundation for the subsequent data processing.

Data Enrichment and Loading:

  1. Data Enrichment from DIM_meteo_info:
    • In this step, the job enriches data from the Vendohm meteo table with attributes obtained from the DIM_meteo_info table.
  2. Loading into FACT_meteo_data:
    • The enriched data is then loaded into the FACT_meteo_data table.
    • This table serves as a fact table containing data relevant to meteorological information.

Logging and Cleanup:

  1. Log Generation:
    • Upon the completion of the data enrichment and loading process, the job generates logs. These logs serve as a record of the job's activities and provide visibility into the data processing steps.
  2. Temporary File Deletion:
    • As a final step, the job takes care of deleting any temporary files created during the data transfer and enrichment process. This cleanup ensures efficient storage resource management.

Main jobs for source extraction
  • J005_ODS_TO_DM_FACT_ENERGY_PRICE

--to the top ↑--

Job description by stepsJob design

Metadata Calculation:

  1. Job Metadata Calculation:
    • The job commences with the calculation of essential job metadata. This metadata include information like job identifiers, timestamps, or specific configuration details.
    • This initial step provides a foundation for subsequent data processing.

Data Enrichment and Loading:

  1. Data Enrichment from DIM_energy_price_info:
    • In this step, the job enriches data from the energy price table by incorporating attributes sourced from the DIM_energy_price_info table.
  2. Loading into FACT_energy_price_data:
    • The enriched data is subsequently loaded into the FACT_energy_price_data table.
    • The FACT_energy_price_data table is designed to store data relevant to energy prices.

Logging and Cleanup:

  1. Log Generation:
    • Following the completion of the data enrichment and loading process, the job generates logs. These logs serve as a record of the job's activities and provide visibility into the data processing steps.
  2. Temporary File Deletion:
    • As a final step, the job manages the deletion of any temporary files that were created during the data transfer and enrichment process. This cleanup ensures efficient storage resource management.

Main jobs for source extraction
  • J006_ODS_TO_DM_FACT_CO2_EMISSIONS

--to the top ↑--

Job description by stepsJob design

Metadata Calculation:

  1. Job Metadata Calculation:
    • The job commences by calculating essential job metadata. This metadata includes information such as job identifiers, timestamps, or specific configuration details.
    • This initial step provides a foundation for subsequent data processing.

Latest CO2 Data Refresh:

  1. Refreshing/Providing Latest CO2 Data:
    • In this step, the job focuses on refreshing or providing the latest CO2 emissions data.
    • The source of this data is the Operational Data Store (ODS) table, which serves as a central repository for CO2-related information.

Logging and Cleanup:

  1. Log Generation:
    • Following the completion of the CO2 data refresh, the job generates logs. These logs serve as a record of the job's activities and provide visibility into the data refresh process.
  2. Temporary File Deletion
    • As a final step, the job manages the deletion of any temporary files that were created during the data transfer process. This cleanup ensures efficient storage resource management.

Main jobs for source extraction
  • J007_ODS_TO_DM_FACT_SOLID_FUEL_WAP

--to the top ↑--

Job description by stepsJob design

Metadata Calculation:

  1. Job Metadata Calculation:
    • The job commences by calculating essential job metadata. This metadata includes information such as job identifiers, timestamps, or specific configuration details.
    • This initial step provides a foundation for subsequent data processing.

Latest wap solid fuel data Refresh:

  1. Refreshing/Providing Latest Cost Data:
    • In this step, the job focuses on refreshing or providing the latest weighted average fuel prices data.
    • The source of this data is the Operational Data Store (ODS) table, which serves as a central repository for wap solid fuel -related information.

Logging and Cleanup:

  1. Log Generation:
    • Following the completion of the Wap solid fuel data refresh, the job generates logs. These logs serve as a record of the job's activities and provide visibility into the data refresh process.
  2. Temporary File Deletion
    • As a final step, the job manages the deletion of any temporary files that were created during the data transfer process. This cleanup ensures efficient storage resource management.

Main jobs for source extraction
  • J009_ODS_TO_DM_FACT_ENERGY_DEALS

--to the top ↑--

Job description by stepsJob design

Metadata Calculation:

  1. Job Metadata Calculation:
    • The job initiates by calculating essential job metadata. This metadata include information like job identifiers, timestamps, or job-specific configuration details.
    • This step establishes the foundation for the subsequent data processing.

Data Enrichment and Loading:

  1. Data Enrichment from GSheet data (Hubs, Counter Parties and Sites Hedges):
    • In this step, the job enriches data from the Energy deals table with attributes obtained from the Hubs, Counter Parties and Sites Hedges (see section Objects for details).
  2. Loading into FACT_energy_deals and FACT_energy_deals_daily:
    • The enriched data is then loaded into the FACT_energy_deals table.
    • This table serves as a fact table containing data relevant to energy deals by site.
    • The process involves determining the daily values between the deal's start and end dates into FACT_energy_deals_daily.
    • Within this step, the job conducts data transformation operations.
    • For each day within the deal duration, specific data is generated and provided.
    • A crucial aspect of this transformation is handling prices differently based on the nature of the deal. If the deal is a "sell," a minus sign is assigned, effectively representing a negative value. Conversely, if the deal is a "buy," a positive numeric value is assigned.

Logging and Cleanup:

  1. Log Generation:
    • Upon the completion of the data enrichment and loading process, the job generates logs. These logs serve as a record of the job's activities and provide visibility into the data processing steps.
  2. Temporary File Deletion:
    • As a final step, the job takes care of deleting any temporary files created during the data transfer and enrichment process. This cleanup ensures efficient storage resource management.

Main jobs for source extraction
  • J001_DM_TO_DM_FACT_ENERGY_PRICE_HOURLY

--to the top ↑--

Job description by stepsJob design

Data Extraction and Transformation:

  1. The code extracts data related to energy prices and contracts from a database.
    • It assigns priority levels based on the type of contract (e.g., daily, weekly, monthly).
    • It calculates the first and last delivery dates based on delivery date, contract type and commodity.
    • Delivery date is ranked by published date (latest published date for specific delivery date is considered)

Explanation of First and Last Delivery Dates:

  1. The determination of first and last delivery dates is critical in the context of energy contracts, as these dates define when energy transactions are executed.
    • For Daily Contracts (Contract = 'D') - Spot source is considered:
      1. If the commodity is 'ELECTRICITY,' the first delivery date is set to the same date as the delivery date itself. This means electricity price is delivered on the same time as specified in the contract.
      2. For other commodities under daily contracts, the first delivery date is set to the start of the day (midnight).
      3. The last delivery date for all daily contracts is set to the end of the day (one second before midnight).
    • For Weekly Contracts (Contract = 'W'):
      1. The first delivery date is set to the beginning of the week (typically Monday) specified in the delivery date, and the last delivery date is set to one second before the end of the week. Timespan is used to calculate number of weeks (forwards).
    • For Monthly Contracts (Contract = 'M'):
      1. The first delivery date is set to the start of the month specified in the delivery date, and the last delivery date is set to one second before the end of that month. Timespan is used to calculate number of months (forwards).
    • For Quarterly Contracts (Contract = 'Q'):
      1. The first delivery date is set to the start of the quarter specified in the delivery date, and the last delivery date is set to one second before the end of that quarter. Timespan is used to calculate number of months (forwards).
    • For Semi-Annual (2 season summer/winter) Contracts (Contract = 'S'):
      1. The first delivery date is set to the start of the season specified in the delivery date, and the last delivery date is set to one second before the end of that season. Timespan is used to calculate number of seasons (forwards).
    • For Yearly Contracts (Contract = 'Y'):
      1. The first delivery date is set to the start of the year specified in the delivery date, and the last delivery date is set to one second before the end of that year. Timespan is used to calculate number of years (forwards).

At the end data is pivoted by common names by country and commodity. Between the first and last delivery date time series by hour are generated. 


4.5 - Scheduling and Automation

TMC - Robustify load - Daily at 8 AM.

4.6 - Data Validation


Responsible & contact points:

  • Alessandro Mainardi - Project Owner
  • Simon Bourguignon - Delivery Manager
  • Alba Carrero/ Gaetan Frenoy - Product Owner
  • Rui Ferraz - Project Manager


  • No labels