Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Project Name

Each ELN product follows a distinct IDBS access rights workflow

ETL 

Talend

Status

Deployed in PROD

1.  Project Overview

This document serves as the official technical handover wiki for the ETL pipeline that ingests data from multiple source systems.

The ELN API, Oracle Database Views, and the Datalab Platform loads the processed data into Google Cloud Platform (GCP) BigQuery tables using Talend as the ETL orchestration tool.

Objectives

The ELN system hosts multiple independent research projects. The Talend ETL flow enforces a project-level access policy: each collaborator is assigned to one or more specific ELN projects, and the pipeline only extracts and loads the data belonging to that collaborator's assigned projects.

Scope

      • All active Talend jobs connecting to ELN API, Oracle Views, and DataLab.
      • Access management rules governing ELN products assignment per collaborator.
      • GCP BigQuery datasets and tables that serve as the final landing zone.

2.  Architecture & Data Flow

...

3.1  ELN API

Authentication:

        (TEST)

        (PROD)

Note :

        • Get the

...

        • list of container IDs

...

        • from the Oracle

...

        • view, then iterate over it as

...

        • an input variable in the URL. 
        • Container IDs are should be available in this view :IDBS_EWB_SEC.HISTORY_ENTRIES.ENTITY_PATH

Exemple :  

3.2  Oracle DB 

Authentication:

...

        • IDBS_EWB_SEC.HISTORY_ENTRIES.TARGET_ENTITY_ID AS Experiment_IDID  
        • IDBS_EWB_SEC.HISTORY_ENTRIES.ENTITY_PATH AS FOLDERPATH
        • IDBS_EWB_SOLVAY_CUSTOM.DM_UserAccessPath_Full AS PERMISSIONS
        • DBS_EWB_SOLVAY_CUSTOM.DM_EntityRolePermissions_Full AS ROLES 

Note:

        • The Oracle database connection is enabled for Talend Remote Engines. However, when connecting to the database via a VDI, the connection may be disabled.

3.3  Datalab 

        • Source: GCP Bigquery table 
        • Project
  • : datagrow 
        • : gcp-sqo-datalab-(*)
        • Dataset: bq_ds_datagrow_dev_ads_static
        • Table: application_use
        • filter:  user
  • Table and filter: application_use WHERE user
        • _status='active' AND ROLE IN ('admin', 'lab_manager')

3.4 Target table: 

Bigquery: DM.ELN_IDBS_AccessRights

...

 (Master)


(Get Experiments)

Image RemovedImage Added

(Cross experiments with collaborators and roles )

...

      • ELN → GCP job: Daily at 02:00 AM (server local time)
      • Oracle → GCP job: Daily at 02:00 AM
      • Datalab → GCP job: Daily at 02:00 AM


4.  Contacts

...

& responsibilities:

...

  •  Data Engineering - Flow Maintenance

5.  Annexe document: 

This annex document could be helpful if you are looking for more details on access rights related to ELN. (link)