Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info

The standard ETL areas and processes are described on LB ALB Data Dev Architecture - General.

Summary

Table of Contents
maxLevel1

Data flow diagram

*No Labware for Batteries Conductivity


Info

All the schemas are available on google drive for further edition:

Google Drive Live Link
urlhttps://drive.google.com/file/d/1E5QOeBlfR7uzWXH7rx_UoQFZmXE_4jsq/view?usp=sharing

Data Ingestion

Data Sources

ELN

Info
titleSimplified Flow

ELN API → <Extraction> →  Local JSON File

Spreadsheets

SpreadsheetDescriptionData Status*
ELN Conductivity MeasurementThis is a standard ELN spreadsheet for all sites

Status
subtletrue
colourGreen
titleHOT

ELN Battery Experiment PropertiesThis is a standard ELN spreadsheet for all sites

Status
subtletrue
colourGreen
titleHOT

...

The list of the spreadsheets/tables extracted from JSON files coming from ELN can be found in the Data Mapping of the next section.

Related documents

Document NameLink
Battery - ELN Data Model

Google Drive Live Link
urlhttps://

app

drive.

diagrams.net/#G1sD6OqKnBzSR_SrGvzlhEl7s5F5vUGQqD

google.com/file/d/1sD6OqKnBzSR_SrGvzlhEl7s5F5vUGQqD/view?usp=sharing

Battery - ELN Template Documentation

Google Drive Live Link
urlhttps://docs.google.com/document/d/1myJ7zU4cTW1LG6eAj8z5rcKpMqhPOejl/edit#heading=h.gjdgxs

Instruments

Info
titleSimplified Flow

Lab server → Google Share Drive → <Copy> →  Local XML/TXT file

Files

The instrument files are, most of them, manually added to a Google Drive Shared folder

...

*Data Status: HOT: Data currently being updated at the source. It should be loaded regularly. COLD: No data changes at the source. It should be loaded just once.

Data

...

No data mapping needed as the data is copied exactly as the source.

...

Preparation or Parsing

Info
titleSimplified Flow

Local XML/TXT file → <Load> → Cloud Storage/Staging BigQuery

Data Mapping Source => Staging (Talend)

The spreadsheet below presents all data transformation between the raw files (extracted files) and a BigQuery delta table. Some files are unstructured and semi-structured. This steps aims to structure the files in the target table format and checking if the column's type (schema) are conformed expected.Staging table. 

Google Drive Live Link
urlhttps://docs.google.com/spreadsheets/d/1VWSmasqwUzEb4V7FZ4rjE2XKv3ZuUvLfa83kOTch4BM/edit#gid=10602226311xjxpQ6nfeRDM060MzAeB_ZzhZBvp1rHRcYk7tYK0rGk/edit?usp=sharing

Data Integration or Computing

The data integration phase for batteries follows the standard approach describe on LB ALB Data Dev Architecture - General.

Info
titleSimplified Flow

Staging BigQuery → <Transform> → ODS BigQuery 

Data Mapping Staging => ODS (BigQuery SQL views)

The spreadsheet below presents all data transformations between the tables on Staging and ODSfrom Staging tables to ODS tables. This steps aims to add some calculations and intelligence to the data. Not all tables will need passing through this stepstructure the files in the target table format and checking if the column's type (schema) are conformed.

Google Drive Live Link
urlhttps://docs.google.com/spreadsheets/d/1MrCPFjSDz3D7NCkIL1o8gOHBOhlslSgCIEi8HfEX31w1xcwLXJKykko5w7WXLmqjFyygG-yC6t6-t7Z6MkbIfvE/edit?usp=sharing

...

Data

...

Model

The following data model presents the tables present on ODS dataset and the relation between themThis spreadsheet shows how to align all cyclers in just one commun table schema

Google Drive Live Link
urlhttps://
docs
drive.google.com/
spreadsheets
file/d/
1YNAvtRDJPjf1UJVY3Ct
1sD6OqKnBzSR_
3scKeX3RP8TwysM1pEQKDcs
SrGvzlhEl7s5F5vUGQqD/
edit#gid=138880856&range=F2:F40
view?usp=sharing

Image AddedImage Added

Data Presentation (DW/DM)

Info
titleSimplified Flow

ODS BigQuery → <Transform> → DW BigQuery → <Expose> → DM_Battery_Electrochemistry Conductivity BigQuery  

Data Mapping ODS => DW/DM (BigQuery SQL views)

The spreadsheet below presents all data transformations between the tables on ODS and DM_Conductivity. This steps aims to create views for the Data Visualization.

Google Drive Live Link
urlhttps://docs.google.com/spreadsheets/d/
18kfXNPxlmP6xms81znlTxU8sBy9pJopD8k4OvFG9hYE
1XL49eWOdv9tZMcrnU5r7SQGcidEQ4YLj-wBMpzT6Xxo/edit?
pli
usp=
1#gid=1997143284
sharing

Data Model

The following data model presents the tables present on DW/DM dataset and the relation between them: 

Google Drive Live Link
urlhttps://drive.google.com/file/d/1zN1gRNMlynBFdv_TQSYx5C5cgTBggf6p1X5QpTXYvIU5o6PM5UcXIolT51M319mZP/view?usp=sharing

...

Image Added


For the DM dataset (DM_Conductivity), we have the views on the top of the tables before. As defined in the convention, there is no need of the abbreviation "conduct". No need either of Talend jobs.

Orchestrating Jobs

All the jobs are run in sequence under the follow job and project name on TAC/ Talend Cloud:

ProjectJob/FlowAssociated TMC Plan
RnI_ACN_BatteryF010_RnI_ACN_Battery_ELN_IDBS_Orch_FlowPL_RNI_ACN_BATTERY_CONDUCTIVITY_ELN_DAILY
RnI_ACN_BatteryF020_RnI_ACN_Battery_ELN_Integration_Orch_FlowPL_RNI_ACN_BATTERY_CONDUCTIVITY_ELN_DAILY
RnI_ACN_BatteryF011_RnI_ACN_Battery_Instruments_Orch_FlowPL_RNI_ACN_BATTERY_CONDUCTIVITY_INSTRUMENT_DAILY
RnI_ACN_BatteryF021_RnI_ACN_Battery_Instr_Integration_Orch_FlowPL_RNI_ACN_BATTERY_CONDUCTIVITY_INSTRUMENT_DAILY

For scheduling details check the Operational documentation.


The following jobs should not be orchestrated and only run once during the deployment :

ProjectJob/FlowAssociated TMC Plan
RnI_ACN_BatteryF010_RnI_ACN_Battery_Create_BQ_ViewsPL_RNI_ACN_BATTERY_CONDUCTIVITY_CREATE_VIEWS

Talend

ELN

Instruments

Big Query

Tables (Staging)

Views(ODS)

Image Added

Views(DM_Conductivity)

Image AddedImage Removed

Data Visualization

Conductivity (On goingGCP and Tableau documentation) :   

Google Drive Live Link
urlhttps://docs.google.com/spreadsheets/d/1294LMp-xHVA590kVm7rbA-8W-Crvnho2mJ8BfU_zqOQ/edit#gid=1728102916

...