- Created by Menghetti, Mattéo, last modified on Jan 19, 2024
You are viewing an old version of this page. View the current version.
Compare with Current View Page History
« Previous Version 2 Next »
4.1 - Talend Integration
Source data integration with Talend ETL tool
FTP Server (Meteologica):
- Talend connects to the FTP server where files containing current and future electricity prices for Spain, Italy, France, and Germany are located.
- Talend retrieves these files from the FTP server.
- The retrieved files are then loaded into Google Cloud Storage.
Postgres Database (Vendohm):
- Talend establishes a direct connection with the Vendohm PostgreSQL database that stores sensor values identified by specific curve IDs.
- It extracts relevant data from the PostgreSQL database.
- This extracted data is loaded into Google Cloud Storage as files.
- Oracle Database (IRM):
- Talend establishes a direct connection with the IRM Oracle database that stores energy deals identified by specific deal IDs.
- It extracts relevant data from the Oracle database.
- This extracted data is loaded into Google Cloud Storage as files.
Google Sheets (Hedges, wap solid fuels, CO2):
- Talend integrates with Google Sheets, where solid fuel wap, CO2 emissions, and hedging information are stored.
- It retrieves this data from Google Sheets.
- Similar to the other sources, this data is also loaded into Google Cloud Storage as files.
Data Transformation and Loading to Google BigQuery:
- Once data from all four sources (FTP server, PostgreSQL database, Oracle database, Google Sheets) is available in Google Cloud Storage as files,
Talend proceeds with data transformation and loading. - Talend performs data transformations as needed, including cleansing, mapping, and structuring the data for consistency.
- The transformed data is loaded into various stages, operational data stores (ODS), and data mart tables within Google BigQuery.
- These tables are organized to facilitate efficient querying and reporting for energy optimization purposes.
By utilizing Talend for data extraction, transformation, and loading (ETL), the web app ensures that data from diverse sources is collected, processed, and structured for analysis and reporting within Google BigQuery, enabling users to make informed decisions based on up-to-date and accurate data.
4.2 - Source Data Extraction
| Main jobs for source extraction |
| --to the top ↑-- |
|---|---|---|
| Job description by steps | Job design | |
We will analyze in detail on job but the logic in the other is simimar |
| |
|
|
| Main jobs for source extraction |
| --to the top ↑-- |
|---|---|---|
| Job description by steps | Job design | |
This Talend job efficiently extracts, formats, and securely stores data from a Google Sheets document into Google Cloud Storage, ensuring that the data is readily available for further analysis and processing while adhering to a fixed schema and naming conventions. |
|
| Main jobs for source extraction |
| --to the top ↑-- |
|---|---|---|
| Job description by steps | Job design | |
This Talend job effectively extracts, formats, and securely stores data from a Google Sheets document into Google Cloud Storage. It features specialized data reading steps, ensuring the capture of year-related data separately, while adhering to predefined schemas and naming conventions. The deletion of the original file enhances data management and resource optimization. |
|
| Main jobs for source extraction |
| --to the top ↑-- |
|---|---|---|
| Job description by steps | Job design | |
This Talend job effectively connects to a PostgreSQL database, retrieves data from a specified table, and securely stores the formatted data in CSV format within Google Cloud Storage. It employs robust security measures and follows strict naming conventions for data organization, with an emphasis on data integrity and resource optimization. |
|
| Main jobs for source extraction |
| --to the top ↑-- |
|---|---|---|
| Job description by steps | Job design | |
This job supports data extraction and reporting processes within an IRM database, ensuring that relevant data is accessible, well-organized, and securely stored for future use. |
|
4.3 - Data Loading to Google BigQuery
| Main jobs for source extraction |
| --to the top ↑-- |
|---|---|---|
| Job description by steps | Job design | |
Job Initialization and Logging:
Data Extraction and Processing:
Logging and Reporting:
This Talend job orchestrates the data extraction, loading, and transformation process. It begins with initialization and metadata capture, proceeds to extract data from the FTP server, handles errors if they occur, and performs data loading and transformation for energy optimization purposes. Detailed logging and error handling mechanisms enhance job monitoring and maintain data integrity. |
|
| Main jobs for source extraction |
| --to the top ↑-- |
|---|---|---|
| Job description by steps | Job design | |
Job Initialization and Logging:
Data Extraction from Google Sheets / IRM database:
Data Loading and Transformation:
Logging and Reporting:
This Talend job streamlines the data extraction, loading, and transformation process. It begins with file name configuration, proceeds with metadata capture, extracts data from Google Sheets, handles errors if they occur, and performs data loading and transformation, all while maintaining detailed logs for monitoring and maintaining data integrity. |
|
| Main jobs for source extraction |
| --to the top ↑-- |
|---|---|---|
| Job description by steps | Job design | |
Global Variable Configuration:
Data Extraction and Processing in Iterations:
Logging and Reporting:
This Talend job is designed for dynamic data extraction and processing, adapting to different tables and files. It initializes each iteration with metadata capture, extracts data from the PostgreSQL database, handles errors when necessary, performs data loading and transformation with parameterized configurations, and maintains detailed logs for monitoring and data integrity across multiple files and tables. |
|
4.4 - Load to DM (calculations and transformations)
| Main jobs for source extraction |
| --to the top ↑-- |
|---|---|---|
| Job description by steps | Job design | |
Metadata Calculation and Cache Initialization:
Data Extraction and Import:
Logging and Cleanup:
|
|
| Main jobs for source extraction |
| --to the top ↑-- |
|---|---|---|
| Job description by steps | Job design | |
Metadata Calculation:
Slowly Changing Dimension (SCD2) Data Loading:
Logging and Cleanup:
|
|
| Main jobs for source extraction |
| --to the top ↑-- |
|---|---|---|
| Job description by steps | Job design | |
Metadata Calculation:
Slowly Changing Dimension (SCD2) Data Loading:
Logging and Cleanup:
|
|
| Main jobs for source extraction |
| --to the top ↑-- |
|---|---|---|
| Job description by steps | Job design | |
Metadata Calculation:
Data Enrichment and Loading:
Logging and Cleanup:
|
|
| Main jobs for source extraction |
| --to the top ↑-- |
|---|---|---|
| Job description by steps | Job design | |
Metadata Calculation:
Data Enrichment and Loading:
Logging and Cleanup:
|
|
| Main jobs for source extraction |
| --to the top ↑-- |
|---|---|---|
| Job description by steps | Job design | |
Metadata Calculation:
Latest CO2 Data Refresh:
Logging and Cleanup:
|
|
| Main jobs for source extraction |
| --to the top ↑-- |
|---|---|---|
| Job description by steps | Job design | |
Metadata Calculation:
Latest wap solid fuel data Refresh:
Logging and Cleanup:
|
|
| Main jobs for source extraction |
| --to the top ↑-- |
|---|---|---|
| Job description by steps | Job design | |
Metadata Calculation:
Data Enrichment and Loading:
Logging and Cleanup:
|
|
| Main jobs for source extraction |
| --to the top ↑-- |
|---|---|---|
| Job description by steps | Job design | |
Data Extraction and Transformation:
Explanation of First and Last Delivery Dates:
At the end data is pivoted by common names by country and commodity. Between the first and last delivery date time series by hour are generated. |
|
4.5 - Scheduling and Automation
TMC - Robustify load - Daily at 8 AM.
4.6 - Data Validation
Responsible & contact points:
- Alessandro Mainardi - Project Owner
- Simon Bourguignon - Delivery Manager
- Alba Carrero/ Gaetan Frenoy - Product Owner
- Rui Ferraz - Project Manager
- No labels


















