Useful links:
We run a daily scheduled pipeline that pulls data from the ELN and loads it through the usual Staging , ODS, DM flow. If anything comes in with a weird format or breaks validation rules, the pipeline sends error notification emails to the stakeholders.
You can find the full setup in TMC Project: WS_DATA_OCEAN_DOMAIN_RNI.
On top of the ODS tables, we built a second layer SQL view that consolidates all solvent-related information. Sydatalab is plugged to this view.
The goal of the view is to stitch together several ODS tables and output a clean consolidated dataset describing solvent experiments.
It combines:
Experiment metadata (project, product, operator, date, setup fields)
Ingredient & solute details (substance, batch, CAS, SMILES, concentration)
Formulation information (solute weights, solvent volumes)
Solvent characteristics (inventory lookups, blend values like dD/dP/dH, TEB, etc.)
Test conditions & results (score, temperature, duration, observations)
HSPIP-style outputs (D, P, H, Tot, Fit, Inside/Outside, etc.)
The final output is essentially: one row per experiment × solute × test group, enriched with everything above.
All tables live in:
Project: gcp-sqo-data-dm-ri-p
Dataset: ODS
ODS_SOLVENT_0000_F001_F_D_datalab_experiment
ODS_SOLVENT_0000_F001_F_D_datalab_Ingredients
ODS_SOLVENT_0000_F001_F_D_Formulation
ODS_SOLVENT_0000_F001_F_D_datalab_Solvent_inventory
ODS_SOLVENT_0000_F001_F_D_datalab_Result
ODS_SOLVENT_0000_F001_F_D_Test_groups
ODS_SOLVENT_0000_F001_F_D_datalab_Tests
ODS_SOLVENT_0000_F001_F_D_datalab_Sample
ODS_SOLVENT_0000_F001_F_D_datalab_Formulation_batch
3. Output Structure (High-Level)
A few examples of how columns map to the logic:
The CTE names above match those used in the script.
Pulls the latest metadata per experiment using a ROW_NUMBER() over meta executions.
All the usual experiment attributes are pivoted into columns.
Output: one row per experiment.
Extracts ingredient-level details: substance, batch, CAS, SMILES, concentration.
Filters out empty ingredients.
Output: one row per ingredient per experiment (mainly solutes).
Builds formulation fields by pivoting amounts, units, and volumes.
Identifies solute vs. solvent based on Ingredient_ID.
Output: one row per experiment with solute weight + solvent volume info.
Matches ingredient names to solvent inventory entries.
Includes logic to handle synonym lists (split + UNNEST).
Blend solvents get their dD/dP/dH values extracted separately.
Also ensures we only take the latest solvent inventory metadata.
Output: enriched solvent properties.
Combines test-group results and test conditions:
Output: one row per experiment × test group with score/conditions.
Extracts HSPIP-type test groups only.
Two pivots: one for conditions (temperature, inside/outside, fit type), one for results (D, ΔD, P, etc.).