Page tree


Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The source data is the gsheet from user, we load to big query (BQ) by direct link to gsheet. This is the requirement want to have BQ reflect the change real time.  However, with the direct link, it is not possible for power BI (PBI) because of security. 

Therefore, the table is request to load to normal table before reading by PBI

Detail job

  1. Table prj-data-industrial-dash-dev.DM.FACT_oee_wip is created by linked to gsheet directly.
  2. Query GSheet-Refresher, which schedule every 30 min will generate table DM.FACT_oee_wip-table
  3. View on DPL dataset will read the data from DM.FACT_oee_wip-table
    1. V_FACT_oee_wip is the same as gsheet
    2. V_FACT_oee_losses split from oee_wip and unpivot column to row Image Removed 
    3. V_FACT_oee_main split from oee_wip to have main data and remove losses data

Access rights

on project prj-data-dm-industrial-[env]. 

Talend

Source to ODS (DATA_OCEAN_DOMAIN_INDUSTRIAL)

J006_oee_gsheet_to_GCS

Image Added

F006_Gsheet_OEE_TO_BQ

Image Added

  1. Read data from
    1. gsheet l_VAR_GOOGLE_SHEET_oee_id
    2. tabname l_VAR_GOOGLE_SHEET_oee_tab
    3. start row l_VAR_GOOGLE_SHEET_oee_row_start = 2 
    4. end row l_VAR_GOOGLE_SHEET_oee_row_end = 0  means all rows
  2. Save data to bucket cs-ew1-prj-data-dm-industrial-dev-staging/Gsheet_OEE
  3. Load data to prj-data-dm-industrial-dev
    1. STG_FIL_0000_0000_F001_F_H_oee
    2. ODS_FIL_0000_F001_F_H_oee

ODS to DM

F016_FACT_oee_main (J016_FACT_oee_main)

Image Added

  1. Assigned extraction time and set the loading getting from last load from ODS_FIL_0000_F001_F_H_oee max meta_business_date
  2. Select data from ODS only the main fields regarding requirement (the rest of the measure except the column starting wit number 1 - 6 are kept in losses table)
  3. Delete data on DM.FACT_oee_main before load data from ODS because there is a case that user enter date 1/1/2024 and change to 01/01/2024, which cause duplicate records
  4. Load the data from point 2 to DM.FACT_oee_main
  5. Write the log


F016_FACT_oee_losses (J016_FACT_oee_losses)

Image Added

  1. Assigned extraction time and set the loading getting from last load from ODS_FIL_0000_F001_F_H_oee max meta_business_date
  2. Select data from ODS only the losses fields on percentage and unpivot the data regarding requirement
  3. Select data from ODS only the losses fields on volume and unpivot the data
  4. Delete data on DM.FACT_oee_losses before load data from ODS because there is a case that user enter date 1/1/2024 and change to 01/01/2024, which cause duplicate records
  5. Load the data from point 2 join with point 3 to DM.FACT_oee_losses
  6. Write the log

From

Image Added

To

Image Added

Access rights

Access to gsheet Not required

Source

Format

  • JSONcolumnar format (Gsheet)

Destination

Location

  • GCP = prj-data-dm-industrial-dash-[env] 
  • STG table
    • STG_FIL_0000_0000_F001_F_H_oee
  • ODS table
    • ODS_FIL_0000_F001_F_H_oee
  • DM Table names 
    • FACT_oee_main
    • FACT_oee_losses
  • DS_industrial_dash
    • V_FACT_oee_wipmain
    • V_FACT_oee_wip-tablelosses


  • GCP = prj-data-industrial-dash-[env]
  • DataOceanDPL View names   
    • V_FACT_oee_wipmain
    • V_FACT_oee_mainlosses
  • DPL View names   
    • V_FACT_oee_losses
    • V_FACT_oee_main

Format

  • columnar format

Sizing

1,405 rows

Assessment

How to validate that the generated output is valid: 

...

Data same as gsheet

Loading

1.1 Incremental Load

N/aA

1.2 Full load

PL_INDUS_OEE 

Source to ODS 

Variable to control:

l_VAR_GOOGLE_SHEET_oee_id = refer to Source session above
l_VAR_GOOGLE_SHEET_oee_tab = OEE (WIP)
l_VAR_GOOGLE_SHEET_oee_row_start = 4
l_VAR_GOOGLE_SHEET_oee_row_end = 0  means all the rows


ODS to DM :  l_VAR_oee_main_dm_reload_condition = incremental or blank will load max meta_business_date from ODSJust trigger the job GSheet-Refresher

1.3. Reloading data

Source to ODS : just run the job again

ODS to DM : control by parameter l_VAR_oee_main_dm_reload_condition. This will impact both main and losses tables. To reload all ODS with latest data l_VAR_oee_main_dm_reload_condition =

QUALIFY ROW_NUMBER () OVER ( PARTITION BY date , gbu , site , unit , cluster   ORDER BY meta_ods_insert_date DESC ) = 1  Just trigger the job GSheet-Refresher

1.4 Plan to schedule

Every 30 minutes

Image RemovedImage Added

1.5 Timing

The average time expected for  loading: 1.5 10 minutes

Criticality

Low?

Logging

  1. Check on the GCP BQ project at "Scheduled queries" the name of the schedule is "GSheet-Refresher"

select job.job_name , job.meta_start_date , job.meta_execution_id , logs.meta_run_id , logs.meta_source_system , logs.meta_step , logs.meta_status , logs.meta_num_lines , logs.meta_error_lines from STG.log_tables logs join STG.run_jobs job on logs.meta_run_id = job.meta_run_id

where   logs.meta_run_id in ( SELECT meta_run_id FROM STG.run_jobs order by meta_start_date desc limit 1000 )

and lower ( job_name ) like '%oee%'

and meta_start_date >   DATE_SUB ( CURRENT_TIMESTAMP () , INTERVAL 1 HOUR )

order by job.meta_start_date descImage Removed