LB SpP Compounding workflow integration

The integration of the SpP Compounding workflow data was intended to use the Unified Data Ingestion (generic pipeline) solution. However, due to changes in future priorities, available resources and budgets, it was not fully implemented as originally intended. This document outlines the current state of the SpP Compounding workflow data integration as of the end of Q4 2025.

Code Management

Repository with all application code: https://gitlab.syensqo.com/syensqo-connected-research/lab-booster/datalab-ingestion/generic-data-pipeline

Repository with all SQL statements: https://gitlab.syensqo.com/syensqo-connected-research/lab-booster/data-architecture/data-engineering

Repository sub-folder with SQL for intermediate tables: https://gitlab.syensqo.com/syensqo-connected-research/lab-booster/data-architecture/data-engineering/-/tree/master/BigQuery/data_ocean_domain_rni/DS_Compounding?ref_type=heads

Repository sub-folder with SQL statements for views used by SyDatalab: https://gitlab.syensqo.com/syensqo-connected-research/lab-booster/data-architecture/data-engineering/-/tree/master/BigQuery/data_ocean_domain_rni/DS_Datalab?ref_type=heads

Data Sources

The source data for SpP Compounding is provided as a set of Microsoft Excel Spreadsheets, due to the data volume. One spreadsheet includes the list of all SpP ingredients, and the remaining spreadsheets contain all the data related to the experiments and tests performed as part of those experiments.

The files are shared on an ad-hoc basis via a shared Google Drive folder.

Mapping document

The mapping document specifies the SpP Compounding data elements that are required for visualization in the SyDatalab web application and their corresponding database column names in the final views.

Data Model

The data model diagram shows all the tables where the transformed source data for Compounding is stored in the legacy datalake (ODS). The custom views were created using the PK/FK relationships illustrated in the data model.

https://lucid.app/lucidchart/bef8276f-bec8-4647-8a49-c14cdceba283/edit?invitationId=inv_21dff114-7efe-41b5-8abe-1c7f733da340&page=0_0#

Data Processing

There are currently three (3) distinct steps in the data processing pipeline. All three steps are executed manually.

Source to CSV export tables
Bucket to ODS
ODS to DM

Source to CSV export tables

The source Excel files are processed and the data converted to CSV “export table” files that closely follow the Data Ocean common data model.

Bucket to ODS

A Talend job loads the CSV “export table” files that are the output of the previous step, and stores them in BigQuery tables in the ODS dataset.

UAT - TMC Plan :

The Talend plan is currently, not in use ; not scheduled and runs only on demand.

It is available exclusively in DEV-UAT environments and includes two tasks: Bucket to STG and ODS, followed by ODS to DM and Facts.

DATA_OCEAN_DOMAINE_RNI .F001_Rnl_G_pipeline_STG_ODS_orchFlow

DATA_OCEAN_DOMAINE_RNI .F002_Build_G_Pipeline_DimsFacts

ODS to DM

The data in the SpP Compounding tables in the ODS dataset and joined according to the data mapping file and stored in the star schema of the Data Ocean common data model in the DM dataset. The data mapping file also includes the business rules for any required transformations, such as pivots, concatenation of values for creating keys, etc. The mapping file can be found at:

https://docs.google.com/spreadsheets/d/1kOvKH8WGWWVddKCMgw0CTZjsvZrkLmTLtZaZr7G-OnU/edit?gid=969526564#gid=969526564

SyDatalab SpP Compounding Views

The SyDatalab web application typically expects 2 views for each workflow, a static view and a results view. For SpP Compounding, these views are:

vw_compounding_experiment_static - includes general data about the experiments and ingredients..

vw_compounding_experiment_results - includes all results data from the tests performed as part of the experiments.

The exact views created in the Data Ocean environments/projects are:

Compounding static data in Data Ocean Dev environment: gcp-sqo-data-dm-ri-d.DS_Datalab.vw_compounding_experiment_static

Compounding results data Data Ocean Dev environment: gcp-sqo-data-dm-ri-d.DS_Datalab.vw_compounding_experiment_results

[ CURRENTLY PULLING DATA FROM INTERMEDIATE TABLES gcp-sqo-data-dm-ri-t.DS_Compounding.compounding_experiment_static & gcp-sqo-data-dm-ri-t.DS_Compounding.compounding_experiment_results ]

Compounding static data in Data Ocean Test environment: gcp-sqo-data-dm-ri-t.DS_Datalab.vw_compounding_experiment_static

Compounding results data Data Ocean Test environment: gcp-sqo-data-dm-ri-t.DS_Datalab.vw_compounding_experiment_results

[ CURRENTLY REDIRECTNG TO TEST ENVIRONMENT gcp-sqo-data-dm-ri-t.DS_Datalab.vw_compounding_experiment_static & gcp-sqo-data-dm-ri-t.DS_Datalab.vw_compounding_experiment_results ]

Compounding static data in Data Ocean Prod environment: gcp-sqo-data-dm-ri-p.DS_Datalab.vw_compounding_experiment_static

Compounding results data in Data Ocean Prod environment: gcp-sqo-data-dm-ri-p.DS_Datalab.vw_compounding_experiment_results

Due to restrictions and permission limitations in the Data Ocean Prod environment, a "redirect" approach was used to allow the SyDatalab web application to query the views defined in the Data Ocean Prod environment, but those views actually query Compounding Production data that is stored in the intermediate tables in the Data Ocean Test environment. This redirection is shown in the architecture diagrams below.

Intermediate Tables

The intermediate tables are created as physical tables that combine the parts of Compounding data before combining them in the final views, to both simplify the SQL statement for the views, as well as to reduce the cloud consumption costs and improve the performance at runtime. In this particular instance, they were also created because the data processing pipeline int the Test environment did not produce the expected data. As there was no time to properly troubleshoot the issue, the validated (production) data form the Dev environment was copied to these intermediate tables to facilitate the availability of the data for users of the production SyDatalab web application.

Compounding results data in Data Ocean Dev environment: gcp-sqo-data-dm-ri-d.DS_Compounding.compounding_experiment_results

Compounding static data in Data Ocean Test environment: gcp-sqo-data-dm-ri-t.DS_Compounding.compounding_experiment_static

Compounding results data in Data Ocean Test environment: gcp-sqo-data-dm-ri-t.DS_Compounding.compounding_experiment_results

In addition to the above, two (2) more intermediate tables were created with the initial sample dataset provided by SpP Compounding that was previously available via the production SyDatalab web application. These tables were created to both facilitate the promotion of the full production data set for SpP Compounding, as well as serve as a rollback mechanism if needed. These tables are:

Compounding static data (backup) in Data Ocean Test environment: gcp-sqo-data-dm-ri-t.DS_Compounding.compounding_experiment_static_bk

Compounding results data (backup) in Data Ocean Test environment: gcp-sqo-data-dm-ri-t.DS_Compounding.compounding_experiment_results_bk

Architecture diagrams

The architecture diagrams that illustrate the data processing pipeline for Compounding can be found here: https://lucid.app/lucidchart/58d161a5-3d41-4488-bd97-0836d6aed27f/edit?page=Jrf2xadOjKxW&invitationId=inv_0294cccd-0f35-4c85-b6e3-64a789dd7771#

The image below is an export form the online Lucid diagrams as of 18 December 2025, and inserted here for convenience. The below diagram illustrates the current (as of 18 Dec 2025) deployment for the Compounding production application with the redirected views.

Space shortcuts

Page tree