...
The source data is the gsheet from user, we load to big query (BQ) by direct link to gsheet. This is the requirement want to have BQ reflect the change real time. However, with the direct link, it is not possible for power BI (PBI) because of security.
Therefore, the table is request to load to normal table before reading by PBI
Detail job
- Table prj-data-industrial-dash-dev.DM.FACT_oee_wip is created by linked to gsheet directly.
- Query GSheet-Refresher, which schedule every 30 min will generate table DM.FACT_oee_wip-table
- View on DPL dataset will read the data from DM.FACT_oee_wip-table
- V_FACT_oee_wip is the same as gsheet
- V_FACT_oee_losses split from oee_wip and unpivot column to row
- V_FACT_oee_main split from oee_wip to have main data and remove losses data
Access rights
on project prj-data-dm-industrial-[env].
Talend
Source to ODS (DATA_OCEAN_DOMAIN_INDUSTRIAL)
J006_oee_gsheet_to_GCS
F006_Gsheet_OEE_TO_BQ
- Read data from
- gsheet l_VAR_GOOGLE_SHEET_oee_id
- tabname l_VAR_GOOGLE_SHEET_oee_tab
- start row l_VAR_GOOGLE_SHEET_oee_row_start = 2
- end row l_VAR_GOOGLE_SHEET_oee_row_end = 0 means all rows
- Save data to bucket cs-ew1-prj-data-dm-industrial-dev-staging/Gsheet_OEE
- Load data to prj-data-dm-industrial-dev
- STG_FIL_0000_0000_F001_F_H_oee
- ODS_FIL_0000_F001_F_H_oee
ODS to DM
F016_FACT_oee_main (J016_FACT_oee_main)
- Assigned extraction time and set the loading getting from last load from ODS_FIL_0000_F001_F_H_oee max meta_business_date
- Select data from ODS only the main fields regarding requirement (the rest of the measure except the column starting wit number 1 - 6 are kept in losses table)
- Delete data on DM.FACT_oee_main before load data from ODS because there is a case that user enter date 1/1/2024 and change to 01/01/2024, which cause duplicate records
- Load the data from point 2 to DM.FACT_oee_main
- Write the log
F016_FACT_oee_losses (J016_FACT_oee_losses)
- Assigned extraction time and set the loading getting from last load from ODS_FIL_0000_F001_F_H_oee max meta_business_date
- Select data from ODS only the losses fields on percentage and unpivot the data regarding requirement
- Select data from ODS only the losses fields on volume and unpivot the data
- Delete data on DM.FACT_oee_losses before load data from ODS because there is a case that user enter date 1/1/2024 and change to 01/01/2024, which cause duplicate records
- Load the data from point 2 join with point 3 to DM.FACT_oee_losses
- Write the log
From
To
Access rights
Access to gsheet Not required
Source
Format
- JSONcolumnar format (Gsheet)
Destination
Location
- GCP = prj-data-dm-industrial-dash-[env]
- STG table
- STG_FIL_0000_0000_F001_F_H_oee
- ODS table
- ODS_FIL_0000_F001_F_H_oee
- DM Table names
- FACT_oee_main
- FACT_oee_losses
- DS_industrial_dash
- V_FACT_oee_wipmain
- V_FACT_oee_wip-tablelosses
- GCP = prj-data-industrial-dash-[env]
- DataOceanDPL View names
- V_FACT_oee_wipmain
- V_FACT_oee_mainlosses
- DPL View names
- V_FACT_oee_losses
- V_FACT_oee_main
Format
- columnar format
Sizing
1,405 rows
Assessment
How to validate that the generated output is valid:
...
Data same as gsheet
Loading
1.1 Incremental Load
N/aA
1.2 Full load
PL_INDUS_OEE
Source to ODS
Variable to control:
l_VAR_GOOGLE_SHEET_oee_id = refer to Source session above
l_VAR_GOOGLE_SHEET_oee_tab = OEE (WIP)
l_VAR_GOOGLE_SHEET_oee_row_start = 4
l_VAR_GOOGLE_SHEET_oee_row_end = 0 means all the rows
ODS to DM : l_VAR_oee_main_dm_reload_condition = incremental or blank will load max meta_business_date from ODSJust trigger the job GSheet-Refresher
1.3. Reloading data
Source to ODS : just run the job again
ODS to DM : control by parameter l_VAR_oee_main_dm_reload_condition. This will impact both main and losses tables. To reload all ODS with latest data l_VAR_oee_main_dm_reload_condition =
QUALIFY ROW_NUMBER () OVER ( PARTITION BY date , gbu , site , unit , cluster ORDER BY meta_ods_insert_date DESC ) = 1 Just trigger the job GSheet-Refresher
1.4 Plan to schedule
Every 30 minutes
1.5 Timing
The average time expected for loading: 1.5 10 minutes
Criticality
Low?
Logging
- Check on the GCP BQ project at "Scheduled queries" the name of the schedule is "GSheet-Refresher"
select job.job_name , job.meta_start_date , job.meta_execution_id , logs.meta_run_id , logs.meta_source_system , logs.meta_step , logs.meta_status , logs.meta_num_lines , logs.meta_error_lines from STG.log_tables logs join STG.run_jobs job on logs.meta_run_id = job.meta_run_id
where logs.meta_run_id in ( SELECT meta_run_id FROM STG.run_jobs order by meta_start_date desc limit 1000 )
and lower ( job_name ) like '%oee%'
and meta_start_date > DATE_SUB ( CURRENT_TIMESTAMP () , INTERVAL 1 HOUR )
order by job.meta_start_date desc








