
Description
The data source is the Pure server (ftp.credit360.com), which Talend will load the file via FTP (more detail from PURE click on this link). The Talend project EHS_PURE load these data first and keep to GCP project solvay-ind-conso-[env] on the dataset ehs_pure_[env]_mig
Source FTP server = "ftp.credit360.com" / User = "solvay" by using Private key file and it keep in remote engine GCP at this folder \DATA\DEV\EHS\Pure\InOut\pure_sftp_ssh_key (control by context variable l_CNX_EHS_PURE_SFTP_private_key )
Talend Project = EHS_PURE
Talend jobs = F004_Connect_to_SFTP + F005_Data_Prep, which are not part of this project.
Load following file to /DATA/DEV/EHS/Pure/Tmp
GCS folder = ehs_pure_dev_mig
Talend Plan = PL_EHS_PURE_HOURL_New run every hour in PROD
Then, Operation Dashboard project using these data to prj-data-industrial-dash-[dev] project by create views
GCP dataset = solvay-ind-conso-dev.DS_prj_data_industrial_dash
After that Talend job in project IND_DASHBOARD generate the FACT tables for TRII and PSE by separate the perspective by site and gbu.
OS = Occupational safety incidents
PS = Process Safety
Tools: Talend
Detail job

- Connect to the source system API by reading context from flow job
- Setup loop to get the data
- tSetGlobalVar : to set the maximum number of records to read each time and set the variable nb to check when to exit the loop (start with 0)
- tLoop : setup the condition to exit the loop when variable nb < 0
- tJava: setup the offset of records in order to get new records of each loop
- To get data from the source by using start row number from "nb" and max row number from "limit". It read schema from the source(meta data)
- Generate output file and save to DATA\DEV\DATA_OCEAN_DOMAIN_DT\Tmp
- Update the offset number "nb" = "nb" + "limit"
- Update "nb" = -1 when ((Integer)globalMap.get("tReplace_1_NB_LINE"))<= 0 in order to exit the loop
- Upload the files all the folder( cs-ew1-prj-data-dm-dt-[dev]-staging)
- Delete all the files in the folder (point number 5)
Flow job

- Setup meta_run_id and filename of the output file
- Get the last load from table STG.incremetnal_load, control by the variable I_VAR_BQ_TABLE_INC_LOAD and configuration the logic of the incremental load in tJava to use the date from incremental_load to the field of create or change date in the SAP
- Call the detail job and pass parameters such as user/password, query from point number 2 to do the incremental load and save the file to GCS
- Call the standard job to upload the files from GCS to ODS
- If the loading is OK and parameter l_VAR_heliux_[table_name]_reload = incremental, update the time on the table incremental_load. If the value is not incremental, it is the reloading
- If everything is OK, update the log.
Access rights
Source
Format
Destination
Location
- Bucket = cs-ew1-prj-data-dm-dt-[dev]-staging/xxx
- DataOean GCP = prj-data-dm-dt-[env]
- STG Table name = prj-data-dm-dt-[env].STG.STG_HLX_0000_0000_F001_I_H_Cases
- ODS Table name = prj-data-dm-dt-[Env].ODS.ODS_HLX_0000_F001_I_H_Cases
- DPL View name = prj-data-dm-dt-[env].DPL.V_FACT_hlx_case
Format
Sizing
Assessment
How to validate that the generated output is valid:
Loading
1.1 Incremental Load
1.2 Full load
1.3. Reloading data
1.4 Plan to schedule
1.5 Timing
The average time expected for loading:
Criticality
High/Medium/Low
Logging