- Created by TRAJKOVIC-ext, Nikola, last modified on Oct 14, 2023
You are viewing an old version of this page. View the current version.
Compare with Current View Page History
« Previous Version 12 Next »
4.1 - Talend Integration
Source data integration with Talend ETL tool
FTP Server (Meteologica):
- Talend connects to the FTP server where files containing current and future electricity prices for Spain, Italy, France, and Germany are located.
- Talend retrieves these files from the FTP server.
- The retrieved files are then loaded into Google Cloud Storage.
Postgres Database (Vendohm):
- Talend establishes a direct connection with the Vendohm PostgreSQL database that stores sensor values identified by specific curve IDs.
- It extracts relevant data from the PostgreSQL database.
- This extracted data is loaded into Google Cloud Storage as files.
Google Sheets (Hedges, Marginal cost, CO2):
- Talend integrates with Google Sheets, where marginal costs, CO2 costs, and hedging information are stored.
- It retrieves this data from Google Sheets.
- Similar to the other sources, this data is also loaded into Google Cloud Storage as files.
Data Transformation and Loading to Google BigQuery:
- Once data from all three sources (FTP server, PostgreSQL database, Google Sheets) is available in Google Cloud Storage as files, Talend proceeds with data transformation and loading.
- Talend performs data transformations as needed, including cleansing, mapping, and structuring the data for consistency.
- The transformed data is loaded into various stages, operational data stores (ODS), and data mart tables within Google BigQuery.
- These tables are organized to facilitate efficient querying and reporting for energy optimization purposes.
By utilizing Talend for data extraction, transformation, and loading (ETL), the web app ensures that data from diverse sources is collected, processed, and structured for analysis and reporting within Google BigQuery, enabling users to make informed decisions based on up-to-date and accurate data.
4.2 - Source Data Extraction
| Main jobs for source extraction |
| --to the top ↑-- |
|---|---|---|
| Job description by steps | Job design | |
|
|
| Main jobs for source extraction |
| |
|---|---|---|
| Job description by steps | Job design | |
This Talend job efficiently extracts, formats, and securely stores data from a Google Sheets document into Google Cloud Storage, ensuring that the data is readily available for further analysis and processing while adhering to a fixed schema and naming conventions. |
|
| Main jobs for source extraction |
| |
|---|---|---|
| Job description by steps | Job design | |
This Talend job effectively extracts, formats, and securely stores data from a Google Sheets document into Google Cloud Storage. It features specialized data reading steps, ensuring the capture of year-related data separately, while adhering to predefined schemas and naming conventions. The deletion of the original file enhances data management and resource optimization. |
|
| Main jobs for source extraction |
| |
|---|---|---|
| Job description by steps | Job design | |
This Talend job effectively connects to a PostgreSQL database, retrieves data from a specified table, and securely stores the formatted data in CSV format within Google Cloud Storage. It employs robust security measures and follows strict naming conventions for data organization, with an emphasis on data integrity and resource optimization. |
|
4.3 - Data Loading to Google BigQuery
| Main jobs for source extraction |
| |
|---|---|---|
| Job description by steps | Job design | |
Job Initialization and Logging:
Data Extraction and Processing:
Logging and Reporting:
This Talend job orchestrates the data extraction, loading, and transformation process. It begins with initialization and metadata capture, proceeds to extract data from the FTP server, handles errors if they occur, and performs data loading and transformation for energy optimization purposes. Detailed logging and error handling mechanisms enhance job monitoring and maintain data integrity. |
|
| Main jobs for source extraction |
| |
|---|---|---|
| Job description by steps | Job design | |
Job Initialization and Logging:
Data Extraction from Google Sheets:
Data Loading and Transformation:
Logging and Reporting:
This Talend job streamlines the data extraction, loading, and transformation process. It begins with file name configuration, proceeds with metadata capture, extracts data from Google Sheets, handles errors if they occur, and performs data loading and transformation, all while maintaining detailed logs for monitoring and maintaining data integrity. |
|
| Main jobs for source extraction |
| |
|---|---|---|
| Job description by steps | Job design | |
Global Variable Configuration:
Data Extraction and Processing in Iterations:
Logging and Reporting:
This Talend job is designed for dynamic data extraction and processing, adapting to different tables and files. It initializes each iteration with metadata capture, extracts data from the PostgreSQL database, handles errors when necessary, performs data loading and transformation with parameterized configurations, and maintains detailed logs for monitoring and data integrity across multiple files and tables. |
|
4.4 - Load to DM (calculations and transformations)
4.5 - Error Handling
4.6 - Scheduling and Automation
4.7 - Performance Optimization
4.8 - Data Validation
4.9 - Dependencies and Order of Operations
4.10 - Versioning and Change Management
Responsible & contact points:
- Alessandro Mainardi - Project Owner
- Simon Bourguignon - Delivery Manager
- Alba Carrero/ Gaetan Frenoy - Product Owner
- Rui Ferraz - Project Manager
- No labels






