Description
The source data is the gsheet from user, we load to big query (BQ) on project prj-data-dm-industrial-[env].
Talend
Source to ODS (DATA_OCEAN_DOMAIN_INDUSTRIAL)
J006_oee_gsheet_to_GCS
F006_Gsheet_OEE_TO_BQ
- Read data from
- gsheet l_VAR_GOOGLE_SHEET_oee_id
- tabname l_VAR_GOOGLE_SHEET_oee_tab
- start row l_VAR_GOOGLE_SHEET_oee_row_start = 2
- end row l_VAR_GOOGLE_SHEET_oee_row_end = 0 means all rows
- Save data to bucket cs-ew1-prj-data-dm-industrial-dev-staging/Gsheet_OEE
- Load data to prj-data-dm-industrial-dev
- STG_FIL_0000_0000_F001_F_H_oee
- ODS_FIL_0000_F001_F_H_oee
ODS to DM
F016_FACT_oee_main (J016_FACT_oee_main)
- Assigned extraction time and set the loading getting from last load from ODS_FIL_0000_F001_F_H_oee max meta_business_date
- Select data from ODS only the main fields regarding requirement (the rest of the measure except the column starting wit number 1 - 6 are kept in losses table)
- Delete data on DM.FACT_oee_main before load data from ODS because there is a case that user enter date 1/1/2024 and change to 01/01/2024, which cause duplicate records
- Load the data from point 2 to DM.FACT_oee_main
- Write the log
F016_FACT_oee_losses (J016_FACT_oee_losses)
- Assigned extraction time and set the loading getting from last load from ODS_FIL_0000_F001_F_H_oee max meta_business_date
- Select data from ODS only the losses fields on percentage and unpivot the data regarding requirement
- Select data from ODS only the losses fields on volume and unpivot the data
- Delete data on DM.FACT_oee_losses before load data from ODS because there is a case that user enter date 1/1/2024 and change to 01/01/2024, which cause duplicate records
- Load the data from point 2 join with point 3 to DM.FACT_oee_losses
- Write the log
From
To
Access rights
Access to gsheet
Source
Format
- columnar format (Gsheet)
Destination
Location
- GCP = prj-data-dm-industrial-[env]
- STG table
- STG_FIL_0000_0000_F001_F_H_oee
- ODS table
- ODS_FIL_0000_F001_F_H_oee
- DM Table names
- FACT_oee_main
- FACT_oee_losses
- DS_industrial_dash
- V_FACT_oee_main
- V_FACT_oee_losses
- GCP = prj-data-industrial-dash-[env]
- DataOcean
- V_FACT_oee_main
- V_FACT_oee_losses
- DPL View names
- V_OP_PRODUCTIVITY_OEE_Main_Monthly
- V_OP_PRODUCTIVITY_OEE_Losses_Monthly
Format
- columnar format
Sizing
1,405 rows
Assessment
How to validate that the generated output is valid: Data same as gsheet
Loading
1.1 Incremental Load
N/A
1.2 Full load
PL_INDUS_OEE
Source to ODS
Variable to control:
l_VAR_GOOGLE_SHEET_oee_id = refer to Source session above
l_VAR_GOOGLE_SHEET_oee_tab = OEE (WIP)
l_VAR_GOOGLE_SHEET_oee_row_start = 4
l_VAR_GOOGLE_SHEET_oee_row_end = 0 means all the rows
ODS to DM : l_VAR_oee_main_dm_reload_condition = incremental or blank will load max meta_business_date from ODS
1.3. Reloading data
Source to ODS : just run the job again
ODS to DM : control by parameter l_VAR_oee_main_dm_reload_condition. This will impact both main and losses tables. To reload all ODS with latest data l_VAR_oee_main_dm_reload_condition =
QUALIFY ROW_NUMBER() OVER (PARTITION BY date,gbu,site,unit,cluster ORDER BY meta_ods_insert_date DESC) = 1
1.4 Plan to schedule
Every 30 minutes
1.5 Timing
The average time expected for loading: 10 minutes
Criticality
Low?
Logging
select job.job_name, job.meta_start_date, job.meta_execution_id, logs.meta_run_id, logs.meta_source_system, logs.meta_step, logs.meta_status, logs.meta_num_lines, logs.meta_error_lines from STG.log_tables logs join STG.run_jobs job on logs.meta_run_id = job.meta_run_id
where logs.meta_run_id in (SELECT meta_run_id FROM STG.run_jobs order by meta_start_date desc limit 1000)
and lower(job_name) like '%oee%'
and meta_start_date > DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
order by job.meta_start_date desc






