Talend Data Flow Documentation
Project: Specialty Polymers
────────────────────────────────────────
Source: Labwer (Shared Folders on NAT) → Target: GCP BigQuery Tables
Data Engineering Team
1. Project Overview
This document provides the technical documentation for the Talend data pipeline implemented under the Specialty Polymers project. It covers the end-to-end data flow from the source system (Labwer shared folders deployed on a NAT environment) through to the target destination GCP BigQuery tables. It also describes the compression strategy applied by the Product Owner to manage large data volumes efficiently.
2. Data Flow Architecture
2.1 High-Level Flow
The pipeline follows a standard Extract → Transform → Load (ETL) pattern, enriched with a compression/decompression layer to handle the large data volumes characteristic of this project:
Source :
| Category | Item / Source | Details / Target |
| Oracle Sources | labw-p-oracle-01.syensqo.com | |
| Oracle Sources | labw-q-oracle-01.syensqo.com | |
| Bollate File Paths | Source Directory | \\ITBOLVRS06T\Lab Booster |
| Bollate File Paths | Example File Note that .txt files may contain complex and irregular structures, which can make parsing challenging. | \\10.53.6.10\labo\W-524600\TGA\DA CANC\23-11194-6715351-tga-residuo da acque 965pi plx485- aria - sciarrillo.txt |
| Alpharetta File Paths | Test Files Directory | \\USALPACDv02\Test Files\LabBooster |
Talend Extraction & Tmp shared folder:
| Category | Item / Source | Details / Target |
| Talend / Jobs | Orchestration Job | F730_Thermal_Data_Compression_Orch |
| Talend / Jobs | Sub-Job 1 | J125_instrument_Raw_Data_Compressed_To_Bigquery |
| Talend / Jobs | Sub-Job 2 | J125_Thermal_Raw_Data_Compressed_To_Bigquery |
| Talend / Jobs | SQL Queries Path | V:\PROD\RnI\ACN_Materials\SQLQueries |
| Local Storage | Temp Compression Folder | Z:\(ENV)\RnI\ACN_Materials\tmp\Working\data_compression |
GCP (Bigquery & GCS)
| Category | Item / Source | Details / Target |
| Cloud Console | GCP Storage Link | https://console.cloud.google.com/storage/browser/cs-ew1-labboostermaterials-prod-accepted-files/instruments |
| GCP Targets | Staging (compressed) - Delta | gcp-sqo-labbooster-materials-d.Staging.compressed_thermal_raw_data_delta |
| GCP Targets | Staging (compressed) - Conso | gcp-sqo-labbooster-materials-d.Staging.compressed_thermal_raw_data_conso |
| GCP Targets | ODS (compressed) - Delta | gcp-sqo-labbooster-materials-d.ODS.compressed_thermal_raw_data_delta |
| GCP Targets | ODS (compressed) - Conso | gcp-sqo-labbooster-materials-d.ODS.compressed_thermal_raw_data_conso |
Reporting:
| Category | Item / Source | Details / Target |
| Reporting | Tableau Link | https://eu-west-1a.online.tableau.com/#/site/syensqo/views/MaterialsThermalv1_4DSC-TGA/MaterialThermalDSCdev |
