Solvent Pipeline & SQL View

Useful links: 

Layer 1:Ingestion & Processing

We run a daily scheduled pipeline that pulls data from the ELN and loads it through the usual Staging , ODS, DM flow. If anything comes in with a weird format or breaks validation rules, the pipeline sends error notification emails to the stakeholders. 

You can find the full setup in TMC Project: WS_DATA_OCEAN_DOMAIN_RNI.

Layer 2: Solvent SQL View

On top of the ODS tables, we built a second layer SQL view that consolidates all solvent-related information. Sydatalab is plugged to this view.

1. Purpose

The goal of the view is to stitch together several ODS tables and output a clean consolidated dataset describing solvent experiments.

It combines:

  • Experiment metadata (project, product, operator, date, setup fields)

  • Ingredient & solute details (substance, batch, CAS, SMILES, concentration)

  • Formulation information (solute weights, solvent volumes)

  • Solvent characteristics (inventory lookups, blend values like dD/dP/dH, TEB, etc.)

  • Test conditions & results (score, temperature, duration, observations)

  • HSPIP-style outputs (D, P, H, Tot, Fit, Inside/Outside, etc.)



The final output is essentially: one row per experiment × solute × test group, enriched with everything above.

2. Source Tables

All tables live in:

Project: gcp-sqo-data-dm-ri-p

Dataset: ODS

  • ODS_SOLVENT_0000_F001_F_D_datalab_experiment

  • ODS_SOLVENT_0000_F001_F_D_datalab_Ingredients

  • ODS_SOLVENT_0000_F001_F_D_Formulation

  • ODS_SOLVENT_0000_F001_F_D_datalab_Solvent_inventory

  • ODS_SOLVENT_0000_F001_F_D_datalab_Result

  • ODS_SOLVENT_0000_F001_F_D_Test_groups

  • ODS_SOLVENT_0000_F001_F_D_datalab_Tests

  • ODS_SOLVENT_0000_F001_F_D_datalab_Sample

  • ODS_SOLVENT_0000_F001_F_D_datalab_Formulation_batch

3. Output Structure (High-Level)

A few examples of how columns map to the logic:

  • Experiment fields: EXPERIMENT_ID, PROJECT_, DATE_, OPERATOR, setup flags → from SB_0010

  • Solute/ingredient fields: name, batch, CAS, SMILES, concentration → SB_0020

  • Formulation: solute weight + unit, solvent volume + unit → SB_0030

  • Solvent properties: CAS, SMILES, formula, H-values, TEB, MPT, GHS, etc. → SB_0040

  • Condition/results: duration, temperature, score, observation → SB_0050

  • HSPIP metrics: D, P, H, Tot, R, Fit, Inside/Outside… → SB_0053

The CTE names above match those used in the script.

4. Breakdown of Each CTE

SB_0010_SOLVENT_EXPERIMENTS

Pulls the latest metadata per experiment using a ROW_NUMBER() over meta executions.

All the usual experiment attributes are pivoted into columns.

Output: one row per experiment.

SB_0020_SOLVENT_INGREDIENTS

Extracts ingredient-level details: substance, batch, CAS, SMILES, concentration.

Filters out empty ingredients.

Output: one row per ingredient per experiment (mainly solutes).

SB_0030_SOLVENT_FORMULATIONS

Builds formulation fields by pivoting amounts, units, and volumes.

Identifies solute vs. solvent based on Ingredient_ID.

Output: one row per experiment with solute weight + solvent volume info.

SB_0040_SOLVENT_CHARACTERISTICS

Matches ingredient names to solvent inventory entries.

Includes logic to handle synonym lists (split + UNNEST).

Blend solvents get their dD/dP/dH values extracted separately.

Also ensures we only take the latest solvent inventory metadata.

Output: enriched solvent properties.

SB_0050_SOLVENT_CONDITION_RESULT_V2

Combines test-group results and test conditions:

  • Pulls the latest meta execution for result tables

  • Pivots result values (Score, Observation)

  • Links Samples → Batches → Ingredients

  • Combines with test-conditions (Temperature, Duration)

Output: one row per experiment × test group with score/conditions.

SB_0053_SOLVENT_RESULTS_RS_HSPIP

Extracts HSPIP-type test groups only.

Two pivots: one for conditions (temperature, inside/outside, fit type), one for results (D, ΔD, P, etc.).


  • No labels