Talend Data Flow Documentation

Project: Specialty Polymers

────────────────────────────────────────

Source: Labwer (Shared Folders on NAT)   →   Target: GCP BigQuery Tables

Data Engineering Team  

1. Project Overview

This document provides the technical documentation for the Talend data pipeline implemented under the Specialty Polymers project. It covers the end-to-end data flow from the source system (Labwer shared folders deployed on a NAT environment) through to the target destination GCP BigQuery tables. It also describes the compression strategy applied by the Product Owner to manage large data volumes efficiently.

2. Data Flow Architecture

2.1 High-Level Flow

The pipeline follows a standard Extract → Transform → Load (ETL) pattern, enriched with a compression/decompression layer to handle the large data volumes characteristic of this project:

Source :

CategoryItem / SourceDetails / Target
Oracle Sourceslabw-p-oracle-01.syensqo.com
Oracle Sourceslabw-q-oracle-01.syensqo.com
Bollate File PathsSource Directory\\ITBOLVRS06T\Lab Booster
Bollate File Paths

Example File

Note that .txt files may contain complex and irregular structures, which can make parsing challenging.

\\10.53.6.10\labo\W-524600\TGA\DA CANC\23-11194-6715351-tga-residuo da acque 965pi plx485- aria - sciarrillo.txt
Alpharetta File PathsTest Files Directory\\USALPACDv02\Test Files\LabBooster


Talend Extraction & Tmp shared folder:

CategoryItem / SourceDetails / Target
Talend / JobsOrchestration JobF730_Thermal_Data_Compression_Orch
Talend / JobsSub-Job 1J125_instrument_Raw_Data_Compressed_To_Bigquery
Talend / JobsSub-Job 2J125_Thermal_Raw_Data_Compressed_To_Bigquery
Talend / JobsSQL Queries PathV:\PROD\RnI\ACN_Materials\SQLQueries
Local StorageTemp Compression FolderZ:\(ENV)\RnI\ACN_Materials\tmp\Working\data_compression


GCP (Bigquery & GCS)

CategoryItem / SourceDetails / Target
Cloud ConsoleGCP Storage Linkhttps://console.cloud.google.com/storage/browser/cs-ew1-labboostermaterials-prod-accepted-files/instruments
GCP TargetsStaging (compressed) - Deltagcp-sqo-labbooster-materials-d.Staging.compressed_thermal_raw_data_delta
GCP TargetsStaging (compressed) - Consogcp-sqo-labbooster-materials-d.Staging.compressed_thermal_raw_data_conso
GCP TargetsODS (compressed) - Deltagcp-sqo-labbooster-materials-d.ODS.compressed_thermal_raw_data_delta
GCP TargetsODS (compressed) - Consogcp-sqo-labbooster-materials-d.ODS.compressed_thermal_raw_data_conso


Reporting:

CategoryItem / SourceDetails / Target
ReportingTableau Linkhttps://eu-west-1a.online.tableau.com/#/site/syensqo/views/MaterialsThermalv1_4DSC-TGA/MaterialThermalDSCdev