| Table of Contents |
|---|
Description
The data source is the Pure server (ftp.credit360.com), which Talend will load the file via FTP (more detail from PURE click on this link). The Talend project EHS_PURE load these data first and keep to GCP project solvay-ind-conso-[env] on the dataset ehs_pure_[env]_mig, which is existing and not part of Operational Dashboard project.
Source FTP server = "ftp.credit360.com" / User = "solvay" by using Private key file and it keep in remote engine GCP at this folder \DATA\DEV\EHS\Pure\InOut\pure_sftp_ssh_key (control by context variable l_CNX_EHS_PURE_SFTP_private_key )
...
GCP dataset = solvay-ind-conso-dev.DS_prj_data_industrial_dash
V_core_hd_monthly
V_os_data
V_ps_data
Note: OS = Occupational safety incidents
PS = Process Safety
After that Talend job in project IND_DASHBOARD generate the FACT tables for TRII and PSE by separate the perspective by site and gbu.
Tools: Talend
Detail job
- J080_FACT_trii_site
- tJava check the date input
- tBigQueryInput1 Calculate the data from os and core_hd_monthly to get rolling last 12 months based on site and gbu
- tMap Generate key and meta_* data
- tBigQuerySQLRow delete the FACT table since it will be full load from the source
- Load the data to the FACT table
- If the loading is error, email will be sent to inform DataOps team
It is the same for
- J081_FACT_trii_gbu, which step 2 has the script group by only gbu
- J082_FACT_pse_site and J083_FACT_pse_gbu, are the same as trii but using table ps instead of os for PSE (the different is only script on step2)
Flow job
- F080_FACT_trii_site
- Setup meta_run_id and filename of the output file
- Call the detail job and pass parameters such as filename, date
- Call the standard job to upload the files from GCS to ODS
- If everything is OK, update the log.
Access rights
Required to access solvay-ind-conso-[env]
Source
- ftp.credit360.com → GCP solvay-ind-conso-[env]
Format
- Table
Destination
Location
...
→ prj-data-industrial-dash-[env].DataOcean_solvay_conso.V_core_hd_monthly → DPL.V_core_hd_monthly
v_core_hd_quarterly → prj-data-industrial-dash-[env].DataOcean_solvay_conso.V_core_hd_quarterly → DPL.V_core_hd_quarterly
V_os_data → prj-data-industrial-dash
...
-
...
[env]
...
.DataOcean_solvay_conso.V_os_data → DPL.V_os_datat
V_ts_data →
...
prj-data-industrial-dash-
...
Format
- columnar format
Sizing
Site around 5000 records
GBU around 500 records
Assessment
How to validate that the generated output is valid:
...
[env].DataOcean_solvay_conso.V_ts_data → DPL.V_ts_data
V_ps_data → prj-data-industrial-dash-[env].DataOcean_solvay_conso.V_ps_data → DPL.V_ps_data
NOTE:
- OP = Occupational Safety
- PS = Process Safety
- TS = Transport Safety
Loading
1.1 Incremental Load
Not available
1.2 Full load
Plan PL_TRII_PSE run run 9:00 AM on date 1,5,10,15,20,25,30. There is no context variable to reload
1.3. Reloading data
Just do the full load again
1.4 Plan to schedule
run 9:00 AM on date 1,5,10,15,20,25,30
1.5 Timing
The average time expected for loading: around 5 mins
Criticality
High/Medium/Low
Logging
...


