Description

The data source is the Pure server (ftp.credit360.com), which Talend will load the file via FTP (more detail from PURE click on this link).  The Talend project EHS_PURE load these data first and keep to GCP project solvay-ind-conso-[env] on the dataset ehs_pure_[env]_mig

Source FTP server = "ftp.credit360.com"  / User = "solvay" by using Private key file and it keep in remote engine GCP at this folder \DATA\DEV\EHS\Pure\InOut\pure_sftp_ssh_key (control by context variable l_CNX_EHS_PURE_SFTP_private_key )

Talend Project = EHS_PURE

Talend jobs  = F004_Connect_to_SFTP  + F005_Data_Prep, which are not part of this project.

Load following file  to /DATA/DEV/EHS/Pure/Tmp

GCS folder = ehs_pure_dev_mig

Talend Plan = PL_EHS_PURE_HOURL_New run every hour in PROD


Then, Operation Dashboard project using these data to prj-data-industrial-dash-[dev] project by create views

GCP dataset = solvay-ind-conso-dev.DS_prj_data_industrial_dash

V_core_hd_monthly

V_os_data

V_ps_data


Note: OS =  Occupational safety incidents

          PS = Process Safety


After that Talend job in project IND_DASHBOARD generate the FACT tables for TRII and PSE by separate the perspective by site and gbu.


Tools: Talend

Detail job


  1. tJava  check the date input
  2. tBigQueryInput1 Calculate the data from os and core_hd_monthly to get rolling last 12 months based on site and gbu
  3. tMap Generate key and meta_* data
  4. tBigQuerySQLRow delete the FACT table since it will be full load from the source
  5. Load the data to the FACT table
  6. If the loading is error, email will be sent to inform DataOps team


It is the same for 


Flow job

Access rights

Required to access solvay-ind-conso-[env]

Source

Format

Destination

Product

Format

Sizing

Site around 5000 records

GBU around 500 records

Assessment

Data is same to the file from "ftp.credit360.com"

Script to check rolling 12 months

Select site_code,gbu_code,yearmonth,rii,working_hours,rolling_12_mth_rii,rolling_12_mth_working_hours
from `prj-data-industrial-dash-dev.DM.FACT_trii_site`
order by site_code,gbu_code,yearmonth

Each line of rolling 12 months will sum back 12 lines of grouping site, gbu and yearmonth. 

Similar to pse. 

Loading

1.1 Incremental Load

Not available

1.2 Full load

Plan PL_TRII_PSE.  There is no context variable to reload

1.3. Reloading data

Just do the full load again

1.4 Plan to schedule

WS_EHS_PURE: PL_EHS_PURE_HOURLY : At 08:00 CET, on day 1, 5, 10, 15, 20, 25, and 30

WS_IND_DASHBOARD: PL_TRII_PSE run 9:00 AM on date  1,5,10,15,20,25,30

1.5 Timing

The average time expected for  loading: around 5 mins

Criticality

High/Medium/Low

Logging

Data Ocean solvay-ind-conso-[env]

There is no log

Product on prj-data-industrial-dash-[env]

select job.job_name, job.meta_start_date, job.meta_execution_id, logs.meta_run_id, logs.meta_source_system, logs.meta_step, logs.meta_status, logs.meta_num_lines, logs.meta_error_lines from STG.log_tables logs join STG.run_jobs job on logs.meta_run_id = job.meta_run_id
where  logs.meta_run_id in (SELECT meta_run_id FROM STG.run_jobs order by meta_start_date desc limit 1000)
and meta_source_system in ('V_ps_data','V_os_data')
and meta_step = 'ODS to DM'
and meta_start_date >  DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 5 DAY)
order by job.meta_start_date desc