Description
The source data is the gsheet from user, we load to big query (BQ) on project prj-data-dm-industrial-[env].
Talend
Source to ODS
J006_oee_gsheet_to_GCS
F006_Gsheet_OEE_TO_BQ
- Read data from
- gsheet l_VAR_GOOGLE_SHEET_oee_id
- tabname l_VAR_GOOGLE_SHEET_oee_tab
- start row l_VAR_GOOGLE_SHEET_oee_row_start = 2
- end row l_VAR_GOOGLE_SHEET_oee_row_end = 0 means all rows
- Save data to bucket cs-ew1-prj-data-dm-industrial-dev-staging/Gsheet_OEE
- Load data to prj-data-dm-industrial-dev
- STG_FIL_0000_0000_F001_F_H_oee
- ODS_FIL_0000_F001_F_H_oee
ODS to DM
F016_FACT_oee_main (J016_FACT_oee_main)
- Assigned extraction time and set the loading getting from last load from ODS_FIL_0000_F001_F_H_oee max meta_business_date
- Select data from ODS only the main fields regarding requirement
- Select delete data on DM.FACT_oee_main where the key same as the selected on ODS
- Load the data from point 2 to DM.FACT_oee_main
- Write the log
F016_FACT_oee_losses (J016_FACT_oee_losses)
- Assigned extraction time and set the loading getting from last load from ODS_FIL_0000_F001_F_H_oee max meta_business_date
- Select data from ODS only the losses fields on percentage and unpivot the data regarding requirement
- Select data from ODS only the losses fields on volume and unpivot the data
- Select delete data on DM.FACT_oee_main where the key same as the selected on ODS
- Load the data from point 2 join with point 3 to DM.FACT_oee_losses
- Write the log
From
To
Access rights
Access to gsheet
Source
Format
- columnar format (Gsheet)
Destination
Location
- GCP = prj-data-dm-industrial-[env]
- STG table
- STG_FIL_0000_0000_F001_F_H_oee
- ODS table
- ODS_FIL_0000_F001_F_H_oee
- DM Table names
- FACT_oee_main
- FACT_oee_losses
- DS_industrial_dash
- V_FACT_oee_main
- V_FACT_oee_losses
- GCP = prj-data-industrial-dash-[env]
- DataOcean
- V_FACT_oee_main
- V_FACT_oee_losses
- DPL View names
- V_FACT_oee_main
- V_FACT_oee_losses
Format
- columnar format
Sizing
1,405 rows
Assessment
How to validate that the generated output is valid:
Data same as gsheet
Loading
1.1 Incremental Load
N/a
1.2 Full load
1.3. Reloading data
Just trigger the job GSheet-Refresher
1.4 Plan to schedule
Every 30 minutes
1.5 Timing
The average time expected for loading: 10 minutes
Criticality
Low?
Logging
select job.job_name, job.meta_start_date, job.meta_execution_id, logs.meta_run_id, logs.meta_source_system, logs.meta_step, logs.meta_status, logs.meta_num_lines, logs.meta_error_lines from STG.log_tables logs join STG.run_jobs job on logs.meta_run_id = job.meta_run_id
where logs.meta_run_id in (SELECT meta_run_id FROM STG.run_jobs order by meta_start_date desc limit 1000)
and lower(job_name) like '%oee%'
and meta_start_date > DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
order by job.meta_start_date desc






