Description

The source data is the gsheet from user, we load to big query (BQ) on project prj-data-dm-industrial-[env].

Talend

Source to ODS (DATA_OCEAN_DOMAIN_INDUSTRIAL)

J006_oee_gsheet_to_GCS

F006_Gsheet_OEE_TO_BQ

Read data from
1. gsheet l_VAR_GOOGLE_SHEET_oee_id
2. tabname l_VAR_GOOGLE_SHEET_oee_tab
3. start row l_VAR_GOOGLE_SHEET_oee_row_start = 2
4. end row l_VAR_GOOGLE_SHEET_oee_row_end = 0 means all rows
Save data to bucket cs-ew1-prj-data-dm-industrial-dev-staging/Gsheet_OEE
Load data to prj-data-dm-industrial-dev
1. STG_FIL_0000_0000_F001_F_H_oee
2. ODS_FIL_0000_F001_F_H_oee

ODS to DM

F016_FACT_oee_main (J016_FACT_oee_main)

Assigned extraction time and set the loading getting from last load from ODS_FIL_0000_F001_F_H_oee max meta_business_date
Select data from ODS only the main fields regarding requirement (the rest of the measure except the column starting wit number 1 - 6 are kept in losses table)
Delete data on DM.FACT_oee_main before load data from ODS because there is a case that user enter date 1/1/2024 and change to 01/01/2024, which cause duplicate records
Load the data from point 2 to DM.FACT_oee_main
Write the log

F016_FACT_oee_losses (J016_FACT_oee_losses)

Assigned extraction time and set the loading getting from last load from ODS_FIL_0000_F001_F_H_oee max meta_business_date
Select data from ODS only the losses fields on percentage and unpivot the data regarding requirement
Select data from ODS only the losses fields on volume and unpivot the data
Delete data on DM.FACT_oee_losses before load data from ODS because there is a case that user enter date 1/1/2024 and change to 01/01/2024, which cause duplicate records
Load the data from point 2 join with point 3 to DM.FACT_oee_losses
Write the log

From

To

Access rights

Access to gsheet

Source

Format

columnar format (Gsheet)

Destination

Location

GCP = prj-data-dm-industrial-[env]
STG table
- STG_FIL_0000_0000_F001_F_H_oee
ODS table
- ODS_FIL_0000_F001_F_H_oee
DM Table names
- FACT_oee_main
- FACT_oee_losses
DS_industrial_dash
- V_FACT_oee_main
- V_FACT_oee_losses

GCP = prj-data-industrial-dash-[env]
DataOcean
- V_FACT_oee_main
- V_FACT_oee_losses

DPL View names
- V_OP_PRODUCTIVITY_OEE_Main_Monthly
- V_OP_PRODUCTIVITY_OEE_Losses_Monthly

Format

columnar format

Sizing

1,405 rows

Assessment

How to validate that the generated output is valid: Data same as gsheet

Loading

1.1 Incremental Load

N/A

1.2 Full load

PL_INDUS_OEE

Source to ODS

Variable to control:

l_VAR_GOOGLE_SHEET_oee_id = refer to Source session above
l_VAR_GOOGLE_SHEET_oee_tab = OEE (WIP)
l_VAR_GOOGLE_SHEET_oee_row_start = 4
l_VAR_GOOGLE_SHEET_oee_row_end = 0 means all the rows

ODS to DM : l_VAR_oee_main_dm_reload_condition = incremental or blank will load max meta_business_date from ODS

1.3. Reloading data

Source to ODS : just run the job again

ODS to DM : control by parameter l_VAR_oee_main_dm_reload_condition. This will impact both main and losses tables. To reload all ODS with latest data l_VAR_oee_main_dm_reload_condition =

QUALIFY ROW_NUMBER() OVER (PARTITION BY date,gbu,site,unit,cluster ORDER BY meta_ods_insert_date DESC) = 1

1.4 Plan to schedule

Every 30 minutes

1.5 Timing

The average time expected for loading: 10 minutes

Criticality

Low?

Logging

select job.job_name, job.meta_start_date, job.meta_execution_id, logs.meta_run_id, logs.meta_source_system, logs.meta_step, logs.meta_status, logs.meta_num_lines, logs.meta_error_lines from STG.log_tables logs join STG.run_jobs job on logs.meta_run_id = job.meta_run_id

where logs.meta_run_id in (SELECT meta_run_id FROM STG.run_jobs order by meta_start_date desc limit 1000)

and lower(job_name) like '%oee%'

and meta_start_date > DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)

order by job.meta_start_date desc

Page tree

Products

Services

Ways of Working

Meet the Team

Technologies

Help

Life in DA&AI

Description

Talend

Source to ODS (DATA_OCEAN_DOMAIN_INDUSTRIAL)

ODS to DM

Access rights

Source

Format

Destination

Location

Format

Sizing

Assessment

Loading

1.1 Incremental Load

1.2 Full load

1.3. Reloading data

1.4 Plan to schedule

1.5 Timing

Criticality

Logging