Page tree


You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 13 Next »

Description

The source data is the gsheet from user, we load to big query (BQ) on project prj-data-dm-industrial-[env]. 

Talend

Source to ODS (DATA_OCEAN_DOMAIN_INDUSTRIAL)

J006_oee_gsheet_to_GCS

F006_Gsheet_OEE_TO_BQ

  1. Read data from
    1. gsheet l_VAR_GOOGLE_SHEET_oee_id
    2. tabname l_VAR_GOOGLE_SHEET_oee_tab
    3. start row l_VAR_GOOGLE_SHEET_oee_row_start = 2 
    4. end row l_VAR_GOOGLE_SHEET_oee_row_end = 0  means all rows
  2. Save data to bucket cs-ew1-prj-data-dm-industrial-dev-staging/Gsheet_OEE
  3. Load data to prj-data-dm-industrial-dev
    1. STG_FIL_0000_0000_F001_F_H_oee
    2. ODS_FIL_0000_F001_F_H_oee

ODS to DM

F016_FACT_oee_main (J016_FACT_oee_main)

  1. Assigned extraction time and set the loading getting from last load from ODS_FIL_0000_F001_F_H_oee max meta_business_date
  2. Select data from ODS only the main fields regarding requirement (the rest of the measure except the column starting wit number 1 - 6 are kept in losses table)
  3. Delete data on DM.FACT_oee_main before load data from ODS because there is a case that user enter date 1/1/2024 and change to 01/01/2024, which cause duplicate records
  4. Load the data from point 2 to DM.FACT_oee_main
  5. Write the log


F016_FACT_oee_losses (J016_FACT_oee_losses)

  1. Assigned extraction time and set the loading getting from last load from ODS_FIL_0000_F001_F_H_oee max meta_business_date
  2. Select data from ODS only the losses fields on percentage and unpivot the data regarding requirement
  3. Select data from ODS only the losses fields on volume and unpivot the data
  4. Delete data on DM.FACT_oee_losses before load data from ODS because there is a case that user enter date 1/1/2024 and change to 01/01/2024, which cause duplicate records
  5. Load the data from point 2 join with point 3 to DM.FACT_oee_losses
  6. Write the log

From

To

Access rights

Access to gsheet 

Source

Format

  • columnar format (Gsheet)

Destination

Location

  • GCP = prj-data-dm-industrial-[env] 
  • STG table
    • STG_FIL_0000_0000_F001_F_H_oee
  • ODS table
    • ODS_FIL_0000_F001_F_H_oee
  • DM Table names 
    • FACT_oee_main
    • FACT_oee_losses
  • DS_industrial_dash
    • V_FACT_oee_main
    • V_FACT_oee_losses


  • GCP = prj-data-industrial-dash-[env]
  • DataOcean
    • V_FACT_oee_main
    • V_FACT_oee_losses
  • DPL View names   
    • V_OP_PRODUCTIVITY_OEE_Main_Monthly
    • V_OP_PRODUCTIVITY_OEE_Losses_Monthly

Format

  • columnar format

Sizing

1,405 rows

Assessment

How to validate that the generated output is valid:  Data same as gsheet

Loading

1.1 Incremental Load

N/A

1.2 Full load

PL_INDUS_OEE 

Source to ODS 

Variable to control:

l_VAR_GOOGLE_SHEET_oee_id = refer to Source session above
l_VAR_GOOGLE_SHEET_oee_tab = OEE (WIP)
l_VAR_GOOGLE_SHEET_oee_row_start = 4
l_VAR_GOOGLE_SHEET_oee_row_end = 0  means all the rows


ODS to DM :  l_VAR_oee_main_dm_reload_condition = incremental or blank will load max meta_business_date from ODS

1.3. Reloading data

Source to ODS : just run the job again

ODS to DM : control by parameter l_VAR_oee_main_dm_reload_condition. This will impact both main and losses tables. To reload all ODS with latest data l_VAR_oee_main_dm_reload_condition =

QUALIFY ROW_NUMBER() OVER (PARTITION BY date,gbu,site,unit,cluster ORDER BY meta_ods_insert_date DESC) = 1  

1.4 Plan to schedule

Every 30 minutes

1.5 Timing

The average time expected for  loading: 10 minutes

Criticality

Low?

Logging

select job.job_name, job.meta_start_date, job.meta_execution_id, logs.meta_run_id, logs.meta_source_system, logs.meta_step, logs.meta_status, logs.meta_num_lines, logs.meta_error_lines from STG.log_tables logs join STG.run_jobs job on logs.meta_run_id = job.meta_run_id

where  logs.meta_run_id in (SELECT meta_run_id FROM STG.run_jobs order by meta_start_date desc limit 1000)

and lower(job_name) like '%oee%'

and meta_start_date >  DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)

order by job.meta_start_date desc