This section aims to present the Data Architecture implemented for data flow in Lab Booster. 

ELN Schemas


File NameData Model File
Agro
Battery
Coatings
Seed Care
Actizone
HPC Flocculation




ELN Spreadsheets design standards

Design is really important for user experience.

The user feedback about the first version of the ELN templates where more about the design than the content. They didn't really enjoy the "black and yellow" spreadsheets.

So for Coatings' Paint Formulation SS V2, we worked on defining design standards using Solvay's color palette as a base.

These standards are going to evolve according to the future needs.

Here are documentations about developed spreadsheet in PROD:

Agro

BatteryConductivity
BatteryMechanosynthesis

CoatingsEP

Seed CareFormulation
Seed CareResults & Requests





Lab Booster Data model


Overview


A Data Model represents the way data is structured in a dataset or a database, such as Lab Booster’s data ocean.

The data model defines how the data lake or data ocean is connected to:

- The data input i.e. ELN, LIMS systems, connected instruments etc.

- The data output i.e. the WebApp DataLab in which users can access data

Context

As of mid-2023, each market in Lab Booster has its own data model i.e. its own way to structure data.

At each new project, connections to the data lake must be built again

Objective

Our aim is to have a common data model for all markets, to bring:

  • Accelerated delivery of new projects
  • Better performance
  • Less maintenance


This page is divided two sections

  1. Entity-Relationship Diagram (ERD), which served as a basis to design the data model
  2. Data model


Entity-Relationship Diagram (ERD)

Data Models are generally based on a diagram or schema called Entity-Relationship Diagram defining

  • Entities i.e. a definable object or concept within a system
  • Relationships i.e. how entities are related to one another

Building the ERD is a preliminary step to designing the actual data model to ensure that all required entities and relationships are accurately defined and represented.

This section is split in two parts

  1. Entity-Relationship Diagram design
  2. ERD mapping with R&I workflows

Entity-Relationship Diagram design 

Entity dictionary 


EntityDefinitionExample(s)
Experiment

A recording of a workflow performed in the lab by an operator at a given date to achieve an objective

An Experiment includes

  • Activities
  • Samples 
  • Tests
  • Request
  • Planning

Experiments created and recorded in ELN IDBS, LIMS Labware LIMS Agilab...

Solvay UserA recording of the user that created the Experiment, including Solvay ID and email
User PermissionsA setting determining what application options the user has access to
Request

A recording of information provided by user requesting an Experiment

Request includes

  • Request date
  • Sample information
  • Information on user making requests

Requests for BioMatTech - Biodegradability testing include

  • Request name
  • Requestor name
  • Request date
  • Priority
  • Status
  • Test method required
  • Sample name
  • Sample ID
  • Sample status
  • etc.
Planning

A recording of when the Experiment is supposed to be performed

A Planning includes

  • Tests or Activities expected date
  • Results availability date

Planning in Novecare - Méréville Request & Results includes

  • Expected application (of slurries & powders on seed) date
  • Operator performing the application
Activity

A group of Processes performed in the lab in a specific order

In Novecare - Méréville Request & Results, two Activities are found, Application and Testing
Process

A group of Process Steps performed in the lab in a specific order

In BatMat -Mecanosynthesis, the Mecanosynthesis Process is defined by several successive Process Steps

  1. Jar Preparation
  2. Milling
  3. Drying
  4. Calcination
  5. Finishing
Process Step

A recording of tasks performed in the lab, defined by its name and date

A Process Step includes

  • Conditions in which it is carried out
  • Input and output Step Samples
  • Tests performed during Process Step
  • Process End Product

Process Step follows a Standard Operating Procedure (SOP)

In Aroma - Fermentation the Growth Process Step is defined by the date on which it is performed and includes

  • Conditions - Scale, Temperature, pH...  


  • Input Step Samples - Starter media and Substrate 
  • Output Step Samples - Sample #, Date and Time
  • Tests - Optical Density and Glucose analysis
  • Process End Product - Growth media
Process End Product

The chemical output of a Process, defined by its name and date

Process End Product characteristics include composition, aspect, mass and/or volume...

Process End Product can be registered as a new Ingredient for other Formulation (Batch) or Process Steps

In Aroma - Fermentation, the Process End Product of the Process Step "Bioconversion" is vanillin

In Novecare - Méréville Formulation Recipe, the Process End Product of the Formulation Process Step is a formulation

In BatMat - Mecanosynthesis Jar Slurries, Amorphous Precursors and Raw Calcined Products are Process End Products

Ingredient

A chemical product, defined by its name and unique ID and recorded in an inventory

Ingredient characteristics include date, batch number, supplier, physical state (liquid/solid), density, color...

An Ingredient can be:

  • A Formulation Batch
  • Sample
  • Process End-Product

In Aroma - Fermentation, the substrate Ferulic acid is an Ingredient

In Novecare - Méréville Request & Results, Slurries and Powders are Ingredients 

In BatMat - Mecanosynthesis Jar Precursors, Slurries, Amorphous Precursors and Raw Calcined Products are Ingredients

Formulation

A combination of chemical products defined by the Ingredients, the Ingredients target proportions and its name

Formulation characteristics include total number of chemical products, target concentration, target volume, calculated density... 

In Novecare - Méréville Request & Results, a Recipe is a Formulation and is defined by name, ID and label.

Characteristics include Number of products, Products, Recipe unit, Recipe Price, Calculated Recipe Density...

Formulation Batch

A combination of chemical products defined by the Ingredients, the Ingredients actual proportions, its name, unique ID and date

Formulation Batch characteristics include total number of chemical products, actual concentration, total volume, density, container (vessel, jar, bottle)... 

Formulation Batch is a Formulation that has been created in the lab

In Novecare - Méréville Request & Results, a Batch of Recipe is a Formulation Batch and is defined by name, ID and label

Characteristics include Recipe selection, Actual Weight (of Products)

Sample

A part of a substance or component that is taken from the whole substance or component, defined by its name, unique ID and date

A Sample can come from

  • An Ingredient
  • Formulation Batch
  • A Process End-Product 
  • Request

Sample can be used for 

  • Test
  • Process Step

See Step Sample for Samples taken during a Process Step

Samples come from 

  • An Ingredient : Inoculum in BioMatTech - Biodegradability
  • Formulation Batch : Batch of Recipe in Novecare - Méréville Formulation 
  • A Process End-Product : Finished Product in BatMat - Mecanosynthesis

Samples are used for

  • Test: Batch of Recipe to characterize at t0 in Novecare - Méréville Formulation
  • Process Step: Growth mass used in Bioconversion Process Step in Aroma - Fermentation
Step Sample

A part of a substance or component that is taken from the whole substance or component in relation to a Process Step, defined by its name and date 

A Step Sample can be

  • An input for the Process Step
  • An output of the Process Step
In Aroma - Fermentation, Step Samples are taken throughout the three Process Steps to monitor the chemical reactions
Sample Test Plan

Planning defined for a set of Samples, defined by its name and the timing

The Sample Test Plan characteristics include total number of Samples, Tests to perform ...

Sample Test Plan can apply in the context of

  • Process Step
  • Request 
  • Planning

In Novecare - Méréville Formulation the Sample Test Plan defines when Samples should be taken during an ageing Process Step

It is defined by

  • Protocol name
  • Initial storage date
  • Number of Samples
Test GroupA group of Tests performed on the same SampleCharacterization tests (OD manual, OD dencytee and Glucose) performed during the Growth Process Step in Aroma - Fermentation for a Test Group
Test

A measure of Sample behavior when a procedure is carried out 

Tests performed in BatMat - Mecanosynthesis include Particle size test, SEM test, Lumisizer test, H NMR test, P31 NMR test, Li7 NMR test, Discrete value test
Measure

A property that can be measured

Measure can serve both a Condition and/or a Result

pH is Condition in Aroma - Fermentation and a Result in BioMatTech - Biodegradability

Conditions

A variable or setting defined by the operator for

  • A Test and affecting its Result
  • Process Step

In BioMatTech - Biodegradability, Conditions for the Dry matter Test include Empty aluminium cup weight

In Aroma - Fermentation, Conditions of the Growth Process Step include Scale, Temperature, pH... 

Results

The outcome of a Test performed on a Sample in specified Conditions

Results can take the form of

  • A numerical value
  • A set of numerical values (i.e. curve)
  • A non numerical value (i.e. observations)

A pH value is a Result of a biodegradability Test in BioMatTech - Biodegradability

A conductivity curve is a Result of a conductivity Test in BatMat - Conductivity

Observations are a Result of a Look after Attrition Test in Novecare - Méréville Request & Results

Results SeriesA set of Results, obtained at different time intervals, for a Test performed in the same Conditions on the same Sample
Aggregated ResultResult obtained by aggregating Results from several Tests 

In Aroma - Fermentation, the maximum amount of vanilin produced during the Bioconversion Process Step is an Aggregated Result as it aggregates several vanilin concentration measure Results

In Novecare - Méréville Request & Results, averages calculated from two different Test Results  are Aggregated Results

Entity-Relationship Diagram 



ERD mapping with R&I workflows (WIP)

Three types of R&I workflows were identified

  • Formulation workflows
  • Synthesis workflows
  • Analysis workflows

This was done in order to ensure that the ERD defined accomodates all types of R&I workflows.

The mapping done for different workflows is summarized in the table below.


GBU/F- R&IWorkflow nameWorkflow typeMapping statusLink to mappingDocumentation - Data capture
Novecare GBUSeed Care FormulationFormulationDoneSeed Care mappingELN template
Novecare GBUSeed Care Request & ResultsFormulationDoneSeed Care mappingELN template
Battery PlatformMecanosynthesisSynthesisDoneMecanosynthesis mappingELN template
Aroma Performance GBUFermentationSynthesisDoneFermentation mappingELN spreadsheet mockup
BioMatTech PlatformBiodegradabilityAnalysisDoneBiodegradability mappingLIMS spreadsheet mockup
Specialty Polymers GBUAging, Mechanical, Thermal AnalysisOngoing

Specialty Polymers GBU
SynthesisTo do

Novecare GBUAgroFormulationTo do

Novecare GBUEP CoatingsSynthesisTo do

Novecare GBUPaint CoatingsFormulationTo do

Corporate R&ISolvent platform - Solubilization
To do

Corporate R&I
AnalysisTo do

Green Hydrogen PlatformConductivityAnalysisTo do




BigQuery


New Data Model of ALB Data Mart (Exposition layer): https://app.genmymodel.com/api/projects/_k07o4IBOEe29ie0vpi-P5A/diagrams/_k07o4oBOEe29ie0vpi-P5A/svg

Data Mapping to Data Mart:


The following BigQuery datasets are all staging as per the data convention explained previously.

For more ETL (extraction, transformation, loading) details, please refers to: App Lab Booster (ALB) - Data 

Batteries

Materials

Coatings

Data mapping


File NameData Mapping File
Agro
Coatings

Version 2:

Version 1:

Battery
Materials
Seed Care
  • No labels