...
The data source is the Pure server (ftp.credit360.com), which Talend will load the file via FTP (more detail from PURE click on this link). The Talend project EHS_PURE load these data first and keep to GCP project solvay-ind-conso-[env] on the dataset ehs_pure_[env]_mig, which is existing and not part of Operational Dashboard project.
Source FTP server = "ftp.credit360.com" / User = "solvay" by using Private key file and it keep in remote engine GCP at this folder \DATA\DEV\EHS\Pure\InOut\pure_sftp_ssh_key (control by context variable l_CNX_EHS_PURE_SFTP_private_key )
...
GCP dataset = solvay-ind-conso-dev.DS_prj_data_industrial_dash
V_core_hd_monthly
V_os_data
V_ps_data
Note: OS = Occupational safety incidents
PS = Process Safety
After that Talend job in project IND_DASHBOARD generate the FACT tables for TRII and PSE by separate the perspective by site and gbu.
Tools: Talend
Detail job
- J080_FACT_trii_site
- tJava check the date input
- tBigQueryInput1 Calculate the data from os and core_hd_monthly to get rolling last 12 months based on site and gbu
- tMap Generate key and meta_* data
- tBigQuerySQLRow delete the FACT table since it will be full load from the source
- Load the data to the FACT table
- If the loading is error, email will be sent to inform DataOps team
It is the same for
- J081_FACT_trii_gbu, which step 2 has the script group by only gbu
- J082_FACT_pse_site and J083_FACT_pse_gbu, are the same as trii but using table ps instead of os for PSE (the different is only script on step2)
Flow job
- F080_FACT_trii_site
- Setup meta_run_id and filename of the output file
- Call the detail job and pass parameters such as filename, date
- Call the standard job to upload the files from GCS to ODS
- If everything is OK, update the log.
Access rights
Required to access solvay-ind-conso-[env]
Source
- ftp.credit360.com → GCP solvay-ind-conso-[env]
- Loading by plan PL_EHS_PURE_HOURLY : At 10 minutes past the hour, between 07:00 and 18:59, Monday through Friday (UTC +0) in WS_EHS_PURE production
- F004_Connect_to_SFTP
- F005_Data_Prep
Format
- columnar format (file .dsv)
Destination
Product
...
→ prj-data-industrial-dash-[env].DataOcean_solvay_conso.V_core_hd_monthly → DPL.V_core_hd_monthly
v_core_hd_quarterly → prj-data-industrial-dash-[env].DataOcean_solvay_conso.V_core_hd_quarterly → DPL.V_core_hd_quarterly
V_os_data →
...
prj-data-industrial-dash-[env]
...
- FACT_trii_site
- FACT_trii_gbu
- FACT_pse_site
- FACT_pse_gbu
.DataOcean_solvay_conso.V_os_data → DPL.V_os_datat
V_ts_data → prj
Format
- columnar format
Sizing
Site around 5000 records
GBU around 500 records
Assessment
Data is same to the file from "ftp.credit360.com"
Script to check rolling 12 months
Selectsite_code,gbu_code,yearmonth,rii,working_hours,rolling_12_mth_rii,rolling_12_mth_working_hours
from`prj-data-industrial-dash-dev.DM.FACT_trii_site`
orderbysite_code,gbu_code,yearmonth
Each line of rolling 12 months will sum back 12 lines of grouping site, gbu and yearmonth.
Similar to pse.
Loading
1.1 Incremental Load
Not available
1.2 Full load
Plan PL_TRII_PSE. There is no context variable to reload
- F080_FACT_trii_site
- F081_FACT_trii_gbu
- F082_FACT_pse_site
- F083_FACT_pse_gbu
There are variable to control the load
l_VAR_IND_DASH_trii_nb_month_to_DM = 48 means the full load last for 48 months from V_os_data to FACT_trii_xxx
l_VAR_IND_DASH_pse_nb_month_to_DM = 48 means the full load last for 48 months from V_ps_data to FACT_ps_xxx
1.3. Reloading data
Just do the full load again
1.4 Plan to schedule
WS_EHS_PURE: PL_EHS_PURE_HOURLY : At 08:00 CET, on day 1, 5, 10, 15, 20, 25, and 30
WS_IND_DASHBOARD: PL_TRII_PSE run 9:00 AM on date 1,5,10,15,20,25,30
1.5 Timing
The average time expected for loading: around 5 mins
Criticality
High/Medium/Low
Logging
Data Ocean solvay-ind-conso-[env]
There is no log
...
[env].DataOcean_solvay_conso.V_ts_data → DPL.V_ts_data
V_ps_data → prj-data-industrial-dash-[env]
...
selectjob.job_name, job.meta_start_date, job.meta_execution_id, logs.meta_run_id, logs.meta_source_system, logs.meta_step, logs.meta_status, logs.meta_num_lines, logs.meta_error_linesfromSTG.log_tableslogsjoinSTG.run_jobsjobonlogs.meta_run_id = job.meta_run_id
where logs.meta_run_idin(SELECTmeta_run_idFROMSTG.run_jobsorderbymeta_start_datedesclimit1000)
andmeta_source_systemin('DataOcean_solvay_conso.V_ps_data ','→ DPL.V_osps_data')
andmeta_step = 'ODS to DM'
andmeta_start_date> DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL5DAY)
orderbyjob.meta_start_datedesc
NOTE:
- OP = Occupational Safety
- PS = Process Safety
- TS = Transport Safety



