ELN API • Oracle Views • Datalab → Talend → GCP BigQuery
Project Name | Each ELN product follows a distinct IDBS access rights workflow |
ETL | Talend |
Status | Deployed in PROD |
1. Project Overview
This document serves as the official technical handover wiki for the ETL pipeline that ingests data from multiple source systems.
The ELN API, Oracle Database Views, and the Datalab Platform loads the processed data into Google Cloud Platform (GCP) BigQuery tables using Talend as the ETL orchestration tool.
Objectives
The ELN system hosts multiple independent research projects. The Talend ETL flow enforces a project-level access policy: each collaborator is assigned to one or more specific ELN projects, and the pipeline only extracts and loads the data belonging to that collaborator's assigned projects.
Scope
- All active Talend jobs connecting to ELN API, Oracle Views, and DataLab.
- Access management rules governing ELN products assignment per collaborator.
- GCP BigQuery datasets and tables that serve as the final landing zone.
2. Architecture & Data Flow
The ETL pipeline follows a classic ETL pattern orchestrated by Talend. The high-level data flow is described below:
[ELN API] ─┐
[Oracle Views ] ──► [Talend ETL Jobs] ──► [GCP BigQuery Tables]
[Datalab Platform ]─┘
Flow Steps
- Extract: Talend connects to each source (ELN REST API, Oracle JDBC, Datalab (GCP) ) and extracts raw data according to the job configuration.
- Filter / Route: For ELN sources, the flow applies a project-based access filter, ensuring each collaborator's data is scoped to their assigned ELN project(s) only.
- Transform: Data is cleaned, typed, and mapped to the target GCP schema within the Talend job components.
- Load: Transformed data is written to the corresponding BigQuery dataset and table using the GCP BigQuery connector.
3. Source Systems
Source System | Type | Description |
ELN API | REST API | Electronic Lab Notebook – exposes project data per collaborator access level |
Oracle Views | Relational DB Views | Pre-aggregated relational data extracted via JDBC connections |
Datalab Platform | Internal Data Platform | Analytical datasets and processed outputs from the Datalab environment |
3.1 ELN API
Authentication:
- REST / HTTP: Protocol
- Authentication Type : Basic HTTP
- URL: https://mylab.syensqo.com:8443
- Login: (Contact Admin)
- Pasword: (Contact Admin)
Note : Get the containers IDs list from the Oracle views and use it as a parameter variable.
3.2 Oracle DB
Authentication:
(TEST)
- IDBS_DB_Database = ELNTEST.eua.solvay.com
- IDBS_DB_Host = ACEW1TELNBDB02.eua.solvay.com
- IDBS_DB_Port = 1521
- IDBS_DB_UserName = (Contact Admin)
- IDBS_DB_Password = (Contact Admin)
(PROD)
- IDBS_DB_Database = ELNPROD.eua.solvay.com
- IDBS_DB_Host = ACEW1TELNBDB02.eua.solvay.com
- IDBS_DB_Port = 1521
- IDBS_DB_UserName = (Contact Admin)
- IDBS_DB_Password = (Contact Admin)
View:
- IDBS_EWB_SEC.HISTORY_ENTRIES.TARGET_ENTITY_ID AS Experiment_ID
- IDBS_EWB_SEC.HISTORY_ENTRIES.ENTITY_PATH AS FOLDERPATH
- IDBS_EWB_SOLVAY_CUSTOM.DM_UserAccessPath_Full
- DBS_EWB_SOLVAY_CUSTOM.DM_EntityRolePermissions_Full
Note:
The Oracle database connection is enabled for Talend Remote Engines. However, when connecting to the database via a VDI, the connection may be disabled.
3.3 Datalab
- Source: GCP Bigquery table
- Project: datagrow
- Table and filter: application_use WHERE user_status='active' AND ROLE IN ('admin', 'lab_manager')
3.4 Target table:
Bigquery: DM.ELN_IDBS_AccessRights
Talend tasks:
(Master)
(Get Experiments)
(Cross experiments with collaborators and roles )
Error Handling Strategy
- All jobs implement a tDie component on the error branch to terminate gracefully and log a meaningful message.
- API connectivity failures trigger a tWarn followed by a retry loop.
- Data validation errors (null primary keys, type mismatches) are routed to a reject tLogRow component and written to a dedicated GCP error table.
- Job execution logs are visible in TMC.
Scheduling
- ELN → GCP job: Daily at 02:00 AM (server local time)
- Oracle → GCP job: Daily at 02:00 AM
- Datalab → GCP job: Daily at 02:00 AM
Contacts and responsibilities:
- prasanth.gnanasekar@syensqo.com / Data engineer


