Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info

The standard ETL areas and processes are described on LB ALB Data Dev Architecture - General.

Summary

Table of Contents
maxLevel1

Data flow diagram

*No Labware for Batteries Conductivity


Info

All the schemas are available on google drive for further edition:

Google Drive Live Link
urlhttps://drive.google.com/file/d/1E5QOeBlfR7uzWXH7rx_UoQFZmXE_4jsq/view?usp=sharing

Data Ingestion

Data Sources

ELN

Info
titleSimplified Flow

ELN API → <Extraction> →  Local JSON File

Spreadsheets

SpreadsheetDescriptionData Status*
ELN Conductivity MeasurementThis is a standard ELN spreadsheet for all sites

Status
subtletrue
colourGreen
titleHOT

ELN Battery Experiment PropertiesThis is a standard ELN spreadsheet for all sites

Status
subtletrue
colourGreen
titleHOT

...

The list of the spreadsheets/tables extracted from JSON files coming from ELN can be found in the Data Mapping of the next section.

Related documents

Document NameLink
Battery - ELN Data Model

Google Drive Live Link
urlhttps://drive.google.com/file/d/1sD6OqKnBzSR_SrGvzlhEl7s5F5vUGQqD/view?usp=sharing

Battery - ELN Template Documentation

Google Drive Live Link
urlhttps://docs.google.com/document/d/1myJ7zU4cTW1LG6eAj8z5rcKpMqhPOejl/edit#heading=h.gjdgxs

Instruments

Info
titleSimplified Flow

Lab server → Google Share Drive → <Copy> →  Local XML/TXT file

Files

The instrument files are, most of them, manually added to a Google Drive Shared folder

...

*Data Status: HOT: Data currently being updated at the source. It should be loaded regularly. COLD: No data changes at the source. It should be loaded just once.

Data Preparation or Parsing

Info
titleSimplified Flow

Local XML/TXT file → <Load> → Cloud Storage/Staging BigQuery

Data Mapping Source => Staging (Talend)

The spreadsheet below presents all data transformation between the raw files (extracted files) and a BigQuery Staging table. 

Google Drive Live Link
urlhttps://docs.google.com/spreadsheets/d/1xjxpQ6nfeRDM060MzAeB_ZzhZBvp1rHRcYk7tYK0rGk/edit?usp=sharing

Data Integration or Computing

The data integration phase for batteries follows the standard approach describe on LB ALB Data Dev Architecture - General.

Info
titleSimplified Flow

Staging BigQuery → <Transform> → ODS BigQuery 

Data Mapping Staging => ODS (BigQuery SQL views)

The spreadsheet below presents all data transformations from Staging tables to ODS tables. This steps aims to structure the files in the target table format and checking if the column's type (schema) are conformed.

Google Drive Live Link
urlhttps://docs.google.com/spreadsheets/d/1xcwLXJKykko5w7WXLmqjFyygG-yC6t6-t7Z6MkbIfvE/edit?usp=sharing

Data Model

The following data model presents the tables present on ODS dataset and the relation between them: 

Google Drive Live Link
urlhttps://drive.google.com/file/d/1sD6OqKnBzSR_SrGvzlhEl7s5F5vUGQqD/view?usp=sharing

Image AddedImage Added

Data Presentation (DW/DM)

Info
titleSimplified Flow

ODS BigQuery → <Transform> → DW BigQuery → <Expose> → DM_Battery_Electrochemistry Conductivity BigQuery  

Data Mapping ODS => DW/DM (BigQuery SQL views)

The spreadsheet below presents all data transformations between the tables on ODS and DM_Conductivity. This steps aims to create views for the Data Visualization.

Google Drive Live Link
urlhttps://docs.google.com/spreadsheets/d/1XL49eWOdv9tZMcrnU5r7SQGcidEQ4YLj-wBMpzT6Xxo/edit?usp=sharing

Data Model

The following data model presents the tables present on DW/DM dataset and the relation between them: 

Google Drive Live Link
urlhttps://drive.google.com/file/d/1sD6OqKnBzSR_SrGvzlhEl7s5F5vUGQqD1X5QpTXYvIU5o6PM5UcXIolT51M319mZP/view?usp=sharing

...

Image Added


For the DM dataset (DM_Conductivity), we have the views on the top of the tables before. As defined in the convention, there is no need of the abbreviation "conduct". No need either of Talend jobs.

Orchestrating Jobs

All the jobs are run in sequence under the follow job and project name on Talend Cloud:

...

The following jobs should not be orchestrated and only run once during the deployment :

ProjectJob/FlowAssociated TMC Plan
RnI_ACN_BatteryF010_RnI_ACN_Battery_Create_BQ_ViewsPL_RNI_ACN_BATTERY_CONDUCTIVITY_CREATE_VIEWS

Talend

ELN

Instruments

Big Query

Tables (Staging)

Views(ODS)

Views(DM_Conductivity)

Image RemovedImage Added

Data Visualization

Conductivity (On goingGCP and Tableau documentation) :   

Google Drive Live Link
urlhttps://docs.google.com/spreadsheets/d/1294LMp-xHVA590kVm7rbA-8W-Crvnho2mJ8BfU_zqOQ/edit#gid=1728102916

...