Specific technical documentation for the Application Lab tests

Summary

Data sources

R aw data files for all labs are extracted from the servers presented in the following spreadsheet: Asset Tag Spreadsheet

The table below presents all the Application tests that are currently included in the Datalake, the source of data and the raw data files format for each test:

Test

Data Sources

File format

ODR (Oscillating Disc Rheometer)

Raw data files 

ELN data

.exp

.xml

DSAS (Dynamic Strain Amplitude Sweep)

Raw data files 

ELN data

.csv

.xml

DSAS_U (Dynamic Strain Amplitude Sweep Uncured)

Programs : P05, P28 and P38

Raw data files 

ELN data

.csv

.xml

Tensile Properties

Raw data files 

ELN data

.csv

.xml

Temperature Sweep

Raw data files 

ELN data

.asc, .csv

.xml

DIN Abrasion Resistance

ELN data

.xml

Filler Dispersion

ELN data

.xml

Hardness Shore A

ELN data

.xml

Lambourn

ELN data

.xml

Mixing Process

ELN data

.xml

Mooney Viscosity

ELN data

.xml

Abrasion Loss and Density

ELN data

.xml

Rebound Resilience

ELN data

.xml

Tear Resistance

ELN data

.xml

DSA (Dynamic Strain Amplitude)

ELN data

.xml

Talend jobs

Data is processed using Talend and Python. Talend jobs regarding Application Lab can be found in the Talend project RnI_Silica_Application .

Job F000_Orch_Flow_Application is the orchestrator and will call successively all the other jobs a determined order.

Python scripts

A copy of the Python scripts for downloading, parsing and computing new fields and columns for the Application Lab raw data and ELN data can be found here:

However it's recommended to get the last version of the scripts available in the production path: D:\DATA\PROD\RnI\Silica\Permanent. For the moment, git is not in place for this project. 

ELN Data jobs

The Po2 project will need to change only the defined folders in the file above in order to get the right ELN data


  • F002_ELN_Application_Raw_xml_Upload_to_CloudStorage
    • uploads the raw .xml files into a Google Storage bucket
  • F003_ELN_Application_Unzip_Files
    • unzips all the .zip archives and extracts the .xml files
  • F004 to F021
    • These jobs extract all the tables and fields from the .xml files for each test. The following Python scripts are integrated into the parsing jobs and they extract useful information and put it in an exploitable form before loading it in BigQuery: 
      • parse_StrudyDefinition_ELN.py (F004)
      • parse_DIN_AbrasionResistance_ELN.py (F005)
      • parse_DSAS_ELN.py (F006)
      • parse_FillerDispersion_ELN.py (F007)
      • parse_HardnessShoreA_ELN.py (F008)
      • parse_Lambourn_ELN.py (F009)
      • parse_Mixing_ELN.py (F010)
      • parse_MooneyViscosity_ELN.py (F011)
      • parse_ODR_ELN.py (F012)
      • parse_TemperatureSweep_ELN.py (F013)
      • parse_TensileProperties_ELN.py (F014)
      • parse_AbrasionLossDensity_ELN.py (F015)
      • parse_ReboundResilience_ELN.py (F016)
      • parse_TearResistance_ELN.py (F017)
      • parse_DSA_ELN.py (F018)
      • parse_DSAS_U_P05_ELN.py (F019)
      • parse_DSAS_U_P28_ELN.py (F020)
      • parse_DSAS_U_P38_ELN.py (F021)
  • F020_ELN_Data_Application_Upload_to_BQ
    • uploads the following tables into the BigQuery dataset ELN_data_application_mig:
      • Abrasion_Loss_and_Density_Synthetic_results
      • All_tests_Experimental_condition_range_value
      • All_tests_Experimental_condition_single_value
      • All_tests_Experimental_condition_text
      • All_tests_Experimental_details
      • DIN_abrasion_resistance_Synthetic_results
      • DSA_Synthetic_results
      • Filler_dispersion_Synthetic_results
      • General_Information_Application
      • Hardness_Shore_A_Synthetic_results
      • Lambourn_Synthetic_results
      • Mixing_Raw_data
      • Mixing_Synthetic_results
      • Mooney_viscosity_Synthetic_results
      • Rebound_Resilience_Synthetic_results
      • Study_definition_Mix_and_formulation
      • Study_definition_Operating_methods
      • Study_definition_Phase_list
      • Study_definition_Study_details
      • StudyTestMix_Mapping
      • Tear_Resistance_Synthetic_results


  • F021_IDBS_Application_Extract_Data
    • The users of the Tableau application of the datalake inherit the rights to access data from the ELN (i.e., they will have the right to access in Tableau only the data corresponding to the spreadsheets that they are allowed to access in ELN. Thus, the job F021 is meant to extract from the Oracle DB the permissions for each ELN Silica User.
  • F022_IDBS_Application_Prepare_and_Upload_to_BQ 
    • For each user, lists the Unique IDs of the spreadsheets that he is allowed to access and uploads these information to BigQuery, in the dataset access_rights_mig, the table idbs_access_rights_application.

Raw Data jobs

  • R001_Download_Application - Downloads raw data files from the lab servers.

For the tests for which raw data files are available (i.e., DSAS, DSAS_U, ODR, Tensile Properties, Temperature Sweep), the table raw_data_mapping previously extracted from the ELN .xml files contains, for each test_id, the link to the raw data file that needs to be extracted. For each test, the raw data file extraction is made via the Python scripts incorporated in this job (R001):

  • download_CP_ODR.py
  • download_DP_DSA_Sweep
  • download_GP_DSAS_U
  • download_PP_TP
  • download_temp_sweep


In order to ensure a maximum of flexibility for the user, raw data files are extracted and processed each day. Therefore, the history of raw data files is not stored from one day to the next, every raw data file is extracted every day.

One advantage of this approach is that the user has the possibility to modify or to change the raw data files and to see these changements reflected in the datalake. Nevertheless, this implies that the link to the raw data file written in ELN must always be up to date and it needs to allow the extraction of the file. This condition is fulfilled if (1) the lab server and (2) the arborescence leading to the raw data file do not change.

It is possible that lab servers change due to the replacement of the lab machines. As mentioned above, this has consequences on the extraction of the raw data files because the link to the raw data file written in ELN becomes obsolete. The python scripts that extract the raw data files are able to adapt and to handle the changes of the lab machines without the intervention of the user (No need to modify the link to the file that was written in ELN). Nevertheless, this solution is viable only if the following conditions are fulfilled: 

  1. The name of the root folder containing the raw data needs to stay the same (i.e. Data lake, or any other name that was initially established for each test).
  2. From this level of the arborescence (Data lake\...), the path to the raw data file needs to stay unchanged
  3. Every change of lab machine needs to be reported as soon as possible to our team. The new name of the server needs to be updated in the variable contexts of the Talend project:


  • R002_Raw_Data_Application_Upload_to_CloudStorage 
    • uploads the raw data files into a Google Storage bucket
  • R003_Prepare_Application
    • Extracts useful data from each raw data file and generates computed fields and resume tables for each test. The details of the computations for each test and the python scripts that are used are detailed in the following sections
  • R004_Raw_Data_Application_Upload_to_BQ
    • uploads the following tables, generated in the previous step, into the BigQuery dataset raw_data_application :
      • ApplicationCuringProperties_ODR
      • ApplicationDynamicalProperties_DSAS
      • ApplicationDynamicalProperties_TemperatureSweep
      • ApplicationGreenProperties_DSAS_U
      • ApplicationPhysicalProperties_PP_TP
      • ComputedApplicationCuringProperties_ODR
      • ComputedApplicationDynamicalProperties_DSAS
      • ComputedApplicationDynamicalProperties_TemperatureSweep
      • ComputedApplicationGreenProperties_DSAS_U
      • ComputedApplicationPhysicalProperties_PP_TP
      • ComputedApplicationPhysicalProperties_PP_TP_Aggregated

Application Lab Tests

Curing Properties – Oscillating Disc Rheometer (ODR)

The Python script download_CP_ODR.py extracts the raw data files listed in the ELN table raw_data_mapping from the lab server.

For each raw data file, the Python script parse_CP-ODR.py creates a .csv file containing the following columns extracted from the raw data file:

  • test_id
  • time_sec
  • time_min
  • s_prim
  • temperature
  • speed

For each file generated in the previous step, the python script compute_CP-ODR.py creates a .csv file containing the following values computed from raw data:

  • min_torque

Minimum of “s_prim”

  • max_torque

Maximum of “s_prim”

  • final_torque

Last value of “s_prim”

  • delta_torque

max_torque – min_torque

  • TS2

Value of “time_min” corresponding to min_torque + 2

  • T90

Value of “time_min” corresponding to delta_torque * 90% + min_torque

  • T98

Value of “time_min” corresponding to delta_torque * 98% + min_torque

  • T98_50

T98 * 1.5

  • reversion_index

Compute the slope of “s_prim” starting with time T90.

reversion_index = slope if slope < 0

reversion_index = 0 if slope > 0

  • DT_mT

delta_torque / min_torque

  • curing_speed

Value of “speed” corresponding to T50

Dynamical Properties – Dynamic Strain Amplitude Sweep (DSAS)

The Python script download_DP_DSA_Sweep.py extracts the raw data files listed in the ELN table raw_data_mapping from the lab server.

For each raw data file, the Python script parse_DP_DSA_Sweep creates a .csv file with the following columns extracted from the raw data files:

  • test_id
  • dsa
  • g_star
  • g_prime
  • g_second
  • tan_delta

For each test_id, the Python script compute_DP_DSA_Sweep.py creates a .csv file with the following values computed from raw data:

  • g_prime_0_first

First value of “g_prim”

  • g_prime_0_second

Last value of “g_prim”

  • g_prime_50

Value of “g_prim” corresponding to “dsa” = 0.5

  • g_prime_0_second_50

g_prime_50 - g_prime_0_second

  • g_second_max

Maximum of “g_second” for the descent

  • strain_max_g

Value of “dsa” corresponding to g_second_max

  • tan_d_max

Maximum of “tan_delta” for the descent

  • tan_d_10

Value of “tan_d” at 10% of “dsa” for the descent

  • tan_d_01

Value of “tan_d” at 0.1% of “dsa” for the descent

  • g_star_12

Value of “g_star” at 12% of “dsa” for the descent

Dynamical Properties – Temperature Sweep

The Python script download_Temp_Sweep.py extracts the raw data files listed in the ELN table raw_data_mapping from the lab server.

For each test_id, the Python script parse_DP_Temp_Sweep.py creates a .csv file with the following columns extracted from the raw data files:

  • test_id
  • temperature
  • e_star
  • e_prime
  • e_second
  • tan_delta


For each test_id, the Python script parse_DP_Temp_Sweep.py creates a .csv file with the following values computed from raw data:

  • max_tan_delta

Maximum of “tan_delta”

  • temp_at_max_tan_delta

Value of “temperature” corresponding to max_tan_delta

  • max_esec

Maximum of “e_second”

  • temp_at_max_esec

Value of “temperature” corresponding to max_esec

  • estar_min_70 

Linear regression to infer the value of “e_star” corresponding to “temperature” = -70

  • estar_min_45

Linear regression to infer the value of “e_star” corresponding to “temperature” = -45

  • estar_min_25

Linear regression to infer the value of “e_star” corresponding to “temperature” = -25

  • tan_delta_min_25

Linear regression to infer the value of “tan_delta” corresponding to “temperature” = -25

  • tan_delta_0

Linear regression to infer the value of “tan_delta” corresponding to “temperature” = 0

  • eprime_0

Linear regression to infer the value of “e_prime” corresponding to “temperature” = 0

  • esec_0

Linear regression to infer the value of “e_second” corresponding to “temperature” = 0

  • tan_delta_40

Linear regression to infer the value of “tan_delta” corresponding to “temperature” = 40

  • tan_delta_60

Linear regression to infer the value of “tan_delta” corresponding to “temperature” = 60

  • tan_delta_80

Linear regression to infer the value of “tan_delta” corresponding to “temperature” = 80

  • estar_60

Linear regression to infer the value of “e_star” corresponding to “temperature” = 60

  • eprime_40

Linear regression to infer the value of “e_prime” corresponding to “temperature” = 40

  • tan_delta_80_100 

tan_delta_100 / tan_delta_80

  • estar_min_100_min_80

estar_min_100 / estar_min_80

Green Properties – Dynamic Strain Amplitude Sweep Uncured (DSAS_U)

The Python script download_GP_DSAS_U.py extracts the raw data files listed in the ELN table raw_data_mapping from the lab server.

Raw data files for DSAS_U are extracted only for the P38 program. The python scripts were designed to handle P5 and P28 programs also, but these parts of the code need to be deleted once we will be sure that they are no longer needed.

For each raw data file, the Python script parse_GP_DSAS_U.py creates a .csv file with the following columns extracted from raw data files:

  • test_id
  • time
  • set_strain_perc
  • g_prim
  • g_sec
  • tan_delta


For each test_id, the Python script compute_GP_DSAS_U (P05, P28, P38) creates a .csv file with the following values computed from raw data:

  • g_prim_091

Value of “g_prim” corresponding to the value of “set_strain_perc” that is closest to 0.91%

  • g_prim_14

Value of “g_prim” corresponding to the value of “set_strain_perc” that is closest to 14%

  • g_prim_50

Value of “g_prim” corresponding to the value of “set_strain_perc” that is closest to 50%

  • delta_g_prim

g_prim_091 - g_prim_50

  • tand_091

Value of “tan_delta” corresponding to the value of “set_strain_perc” that is closest to 0.91%

  • tand_50

Value of “tan_delta” corresponding to the value of “set_strain_perc” that is closest to 50%

  • g_sec_091

Value of “g_sec” corresponding to the value of “set_strain_perc” that is closest to 0.91%

  • g_sec_14

Value of “g_sec” corresponding to the value of “set_strain_perc” that is closest to 14%

  • g_sec_50

Value of “g_sec” corresponding to the value of “set_strain_perc” that is closest to 50%

  • delta_g_sec

g_sec_091 - g_sec_50

Physical Properties – Tensile Properties

From ELN data, the Python script download_TensileProperties.py extracts de links to the raw data files that need to be extracted from the server and downloads these files.

Raw data files are in .csv format and they are organized by study ID. The raw data files corresponding to one study ID should be put in a folder named with the sample ID only.

Each folder corresponding to one study ID contains one or several sub-folders. Each sub-folder contains raw data files corresponding to one single test_id and it’s name should be composed by the test_id followed by the string “.is_tens_RawData” (ex., the sub-folder “19cc355-01.is_tens_RawData”). 

Along with the sub-folders, the folder corresponding to the study_id contains one .csv file corresponding to each sub-folder (ex., for the sub-folder “19cc355-01.is_tens_RawData”, we will have the .csv file “19cc355.is_tens.csv”). These .csv files contain the list of specimens for which data will be extracted.

For each test_id and for each specimen, the Python script parse_TensileProperties.py creates a .csv file with the following columns extracted or computed from raw data files :


  • test_id
  • specimen
  • abcissa - list of values (0, 10, 20, 30…)
  • constraint_traction
  • secant_modulus 


(deformation_traction * constraint_traction) / 100 

For each test_id, the average specimen is also computed.

For the average specimen of each test_id, the following columns are computed:

  • sm_avg

Average of secant_modulus of all specimens / “abcissa” * 100

  • J

Average of secant_modulus of all specimens * (“abcissa” / 100 + 1)

  • K

“J” / “abcissa” * 100


The Python script compute_TensileProperties.py creates a .csv file with the following computed values for each test_id and for each specimen:

  • m10

Value of “constrant_traction” corresponding to “abcissa” = 10

  • m100

Value of “constrant_traction” corresponding to “abcissa” = 100

  • m200

Value of “constrant_traction” corresponding to “abcissa” = 200

  • m300

Value of “constrant_traction” corresponding to “abcissa” = 300

  • di_m10_m100

m10 / m100

  • ir_m200_m100

m200 / m100

  • ir_m300_m100

m300 / m100

  • ts

Maximum of “constraint_traction”

  • eb

Value of abcissa corresponding to ts

  • ebts

ts * eb

  • No labels