ELN API  •  Oracle Views  •  Datalab  →  Talend  →  GCP BigQuery


Project Name

Each ELN product follows a distinct IDBS access rights workflow

ETL 

Talend

Status

Deployed in PROD

1.  Project Overview

This document serves as the official technical handover wiki for the ETL pipeline that ingests data from multiple source systems. 

The ELN API, Oracle Database Views, and the Datalab Platform loads the processed data into Google Cloud Platform (GCP) BigQuery tables using Talend as the ETL orchestration tool.

Objectives

Scope

2.  Architecture & Data Flow

The ETL pipeline follows a classic ETL pattern orchestrated by Talend. The high-level data flow is described below:

              [ELN API]  ─┐

                                     [Oracle Views ]  ──► [Talend ETL Jobs] ──► [GCP BigQuery Tables]

[Datalab Platform ]─┘

Flow Steps

3.  Source Systems


Source System

Type

Description

ELN API

REST API

Electronic Lab Notebook – exposes project data per collaborator access level

Oracle Views

Relational DB Views

Pre-aggregated relational data extracted via JDBC connections

Datalab Platform

Internal Data Platform

Analytical datasets and processed outputs from the Datalab environment

3.1  ELN API

Authentication:

Note : Get the containers IDs list from the Oracle views and use it as a parameter variable. 

3.2  Oracle DB 

Authentication:

          (TEST)

          (PROD)

View:

Note:

3.3  Datalab Platform

3.4 GCP 

4.  ELN Projects & Collaborator Access Control


The ELN system hosts multiple independent research projects. The Talend ETL flow enforces a project-level access policy: each collaborator is assigned to one or more specific ELN projects, and the pipeline only extracts and loads the data belonging to that collaborator's assigned projects.

4.2  Access Control Mapping

The table below maps each collaborator to their accessible ELN projects, Oracle schemas, and the resulting GCP target dataset:


Collaborator / Role

ELN Project(s)

Oracle Schemas

GCP Target Dataset

[Collaborator A]

ELN-PRJ-001

[Schema_X]

gcp_dataset_001

[Collaborator B]

ELN-PRJ-001

[Schema_Y]

gcp_dataset_001

[Collaborator C]

ELN-PRJ-002

[Schema_Z]

gcp_dataset_002


4.3  How Access Is Enforced in Talend


5.  Talend ETL Jobs


5.1  Job Inventory

Job Name

Source

Target

Schedule / Trigger

JOB_ELN_TO_GCP

ELN API

GCP BigQuery

Daily – 02:00 AM

JOB_ORA_VIEWS_TO_GCP

Oracle Views

GCP BigQuery

Daily – 03:00 AM

JOB_DATALAB_TO_GCP

Datalab Platform

GCP BigQuery

On-demand / Event


5.2  Job Structure & Key Components


5.3  Context Groups

All sensitive parameters (credentials, URLs, project IDs) are managed through Talend Context Groups. Do not hardcode credentials in job components.


6.  Target – GCP BigQuery


8.  Error Handling & Monitoring


8.1  Error Handling Strategy


8.2  Monitoring & Alerting


8.3  Common Errors & Resolutions


9.  Deployment & Scheduling

9.1  Deployment Steps

9.2  Scheduling