Technical documentation for the data coming from drying tests on the equipment Drying
...
Summary
| Table of Contents | ||||||
|---|---|---|---|---|---|---|
|
Sum-up
| Equipment / Scale | Tesla | Gunsan | Colognes 2500L | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Data Sources | ELN, Raw Data on Google drive | ELN, Raw Data on file share | ELN, Raw Data on Google Drive | |||||||||||||
| Raw Data File type | CSV | xlsx | xls | |||||||||||||
| Scale Name on ELN | FR-170L-TESLA | KR-170L | FR-2500L | |||||||||||||
| Data Collection | Talend: R011_Download_Synthesis_gDrive_Drying | Talend: J010_Download_Synthesis_LabServers Python : download_drying_gunsan.py (to be finished) | Talend: R011J011_Download_Synthesis_gDrive_Drying | |||||||||||||
| Parse | Python: parse_drying_tesla.py | Python: parse_drying_gunsan.py | Python: parse_drying_2500L.py | |||||||||||||
| Compute | Python: compute_drying_tesla.py | Python: compute_drying_gunsan.py | Python: compute_drying_2500L.py | |||||||||||||
| BigQuery | Tables cibles Target tables:
| |||||||||||||||
| Mapping spreadsheet |
| |||||||||||||||
Data Sources
- ELN
- Raw Data on Google Drive
Data Collection
The talend jobs J010_Download_Synthesis_LabServers and J011_Download_Synthesis_gDrive_Drying extract the raw data files listed on the ELN table drying_raw_data_link for which the field “drying_equipment_name” is the scale name, i.e. “FR-170L-TESLA”. For information of how these job works, check the following page :
Talend - Jobs - Synthesis - Download - Drying (needs to be created)
Schema using Google Drive
| Include Page | ||||
|---|---|---|---|---|
|
Schema with file share
| Include Page | ||||
|---|---|---|---|---|
|
| Info |
|---|
| Please refers to the DFS TD - Synthesis - Norms and Conventions for the output filename convention on the Data Collection section |
Data Preparation
Parse
The parsing python scriptsextracts from the raw data files the needed columns.
Include Page DFS : - TD - Data Preparation - Parsing - Schema 01 DFS : - TD - Data Preparation - Parsing - Schema 01
Columns List
For each sample, the script extracts the many fields from the raw data files and outputs a .csv file. For the mapping details, please refers to the sheet "Parse Mapping " on the Drying Mapping spreadsheet (link to the spreadsheet on the Sum-up section).
Compute
The compute python script uses as input the parsed .csv files previously created . It computes the new columns and values from raw data and regenerates new files.
If the output files already exist the script will NOT replace them.
Include Page DFS : - TD - Data Preparation - Computing - Schema 01 DFS : - TD - Data Preparation - Computing - Schema 01
...
For each sample, it creates two different files that will be used to create new tables on BigQuery :
DryingDetails
The first table is composed of the columns previously extracted from the raw data files and the new columns calculated during the execution.
Dataset : raw_data_synthesis_mig
For the columns details, please refers to the sheets " Details Mappings " on the DRYERS Drying Mapping spreadsheet (link to the spreadsheet on the Sum-up section).
...
Dataset : raw_data_synthesis_mig
For the columns details, please refers to the sheets " Summary Mapping " on the DRYERS Drying Mapping spreadsheet (link to the spreadsheet on the Sum-up section).
Presentation
The details and summary files are created as tables on BigQuery unifying all scales in the same tables. A Talend job is responsable to push all this data to a dataset called raw_data_synthesis_mig.
Include Page DFS : - TD - Data Presentation - Upload to BQ - Schema 01 DFS : - TD - Data Presentation - Upload to BQ - Schema 01
Visualization
Refer to Tableau documentation