Page tree


Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The data source is the Pure server (ftp.credit360.com), which Talend will load the file via FTP (more detail from PURE click on this link).  The Talend project EHS_PURE load these data first and keep to GCP project solvay-ind-conso-[env] on the dataset ehs_pure_[env]_mig, which is existing and not part of Operational Dashboard project. 

Source FTP server = "ftp.credit360.com"  / User = "solvay" by using Private key file and it keep in remote engine GCP at this folder \DATA\DEV\EHS\Pure\InOut\pure_sftp_ssh_key (control by context variable l_CNX_EHS_PURE_SFTP_private_key )

...

GCP dataset = solvay-ind-conso-dev.DS_prj_data_industrial_dash

V_core_hd_monthly

V_os_data

V_ps_data

Note: OS =  Occupational safety incidents

          PS = Process Safety

After that Talend job in project IND_DASHBOARD generate the FACT tables for TRII and PSE by separate the perspective by site and gbu.

Tools: Talend

Detail job

  • J080_FACT_trii_site

Image Removed

  1. tJava  check the date input
  2. tBigQueryInput1 Calculate the data from os and core_hd_monthly to get rolling last 12 months based on site and gbu
  3. tMap Generate key and meta_* data
  4. tBigQuerySQLRow delete the FACT table since it will be full load from the source
  5. Load the data to the FACT table
  6. If the loading is error, email will be sent to inform DataOps team

It is the same for 

  • J081_FACT_trii_gbu, which step 2 has the script group by only gbu
  • J082_FACT_pse_site and J083_FACT_pse_gbu, are the same as trii but using table ps instead of os for PSE (the different is only script on step2)

Flow job

  • F080_FACT_trii_site

Image Removed

    • Setup meta_run_id and filename of the output file
    • Call the detail job and pass parameters such as filename, date
    • Call the standard job to upload the files from GCS to ODS
    • If everything is OK, update the log. 

Access rights

Required to access solvay-ind-conso-[env]

Source

Format

  • Table

Destination

Location

...

 → prj-data-industrial-dash-[env].DataOcean_solvay_conso.V_core_hd_monthly → DPL.V_core_hd_monthly

v_core_hd_quarterly →  prj-data-industrial-dash

...

-

...

[env]

...

  • FACT_trii_site
  • FACT_trii_gbu
  • FACT_pse_site
  • FACT_pse_gbu

Format

  • columnar format

Sizing

Site around 5000 records

GBU around 500 records

Assessment

How to validate that the generated output is valid: 

...

Image Removed

.DataOcean_solvay_conso.V_core_hd_quarterly → DPL.V_core_hd_quarterly

V_os_data →  prj-data-industrial-dash-[env].DataOcean_solvay_conso.V_os_data → DPL.V_os_datat

V_ts_data →  prj-data-industrial-dash-[env].DataOcean_solvay_conso.V_ts_data → DPL.V_ts_data

V_ps_data →  prj-data-industrial-dash-[env].DataOcean_solvay_conso.V_ps_data → DPL.V_ps_data

NOTE:

  • OP = Occupational Safety
  • PS = Process Safety
  • TS = Transport Safety

Loading

1.1 Incremental Load

Not available

1.2 Full load

Plan PL_TRII_PSE run run 9:00 AM on date  1,5,10,15,20,25,30.  There is no context variable to reload

1.3. Reloading data

Just do the full load again

1.4 Plan to schedule

run 9:00 AM on date  1,5,10,15,20,25,30

1.5 Timing

The average time expected for  loading: around 5 mins

Criticality

High/Medium/Low

Logging

...