Page tree


Description

The data source getting from data ocean sustainability, which get from 2 BW queries from CERISE to combine with PURE data (it should get from Pure server but at the moment, get from gsheet)

Tools: Talend

From source to ODS 

Talend project DATA_OCEAN_DOMAIN_SUSTAINABILITY (Cerise)

  • F001_QV_BW_QRY_CPECO202_0001_to_ODS (detail job using reference job) 

Xtract job = TALEND_[ENV]_QV_BW_QRY_CPECO202_0001, which filter on authorization scope = ECO and input period on FIscal year period in format YYYYMMM to YYYYMMM


Talend project IND_DASHBOARD

  • F001_BQ_query_to_fact_gHg

1- Creates a temporary table by pivoting data from the Cerise and Pure Dimension tables.

2- Truncates the current information in the Fact table.

3- Inserts the pivoted data that was created in the temporary table and loads it into Fact.

4- Drop the temporary table that was created in the first step.

prj-data-industrial-dash-[environment] (PURE)

Table from Pure:

  • prj-data-industrial-dash-dev.DM.DIM_pure

Dimension table which receive as input the Gsheet:

Dataflow

Cerise: prj-data-dm-sust-dev.ODS.VB_BWH_qlikview_afterco2_cerise → prj-data-dm-sust-dev.DS_prj_data_industrial_dash.V_BWH_qlikview_afterco2_cerise→ prj-data-industrial-dash-dev.DataOcean.V_BWH_qlikview_afterco2_cerise

Pure: prj-data-industrial-dash-dev.DM.DIM_pure → prj-data-industrial-dash-dev.DM.FACT_ghg→ prj-data-industrial-dash-dev.DPL.V_FACT_ghg

Access rights

It is required to access BW query via Xtract.  

Project GCP: 

  1. prj-data-industrial-dash-[env]
  2. prj-data-dm-sust-[env]

Source

BW QueryXtract Job

QV_BW_QRY_CPECO202_0001

TALEND_PRD_QV_BW_QRY_CPECO202_0001

Format

columnar format

Destination

DataOcean

  • Bucket = cs-ew1-prj-data-dm-sust-[env]-staging/BWCERISE/
  • DataOean GCP = prj-data-dm-sust-[env]
  • STG Table names =
    • STG_BWH_0000_0000_F001_F_M_qv_bw_qry_cpeco202_0001
  • ODS Table names =
    • ODS_BWH_0000_F001_F_M_qv_bw_qry_cpeco202_0001
    • V_ODS_BWH_0000_F001_F_M_qv_bw_qry_cpeco202_0001
    • VB_BWH_qlikview_afterco2_cerise

Product

  • GCP = prj-data-industrial-dash-[env]
  • DM
    • FACT_ghg
  • DataOcean
    • V_BWH_qlikview_afterco2_cerise
  • DPL
    • V_FACT_ghg

Format

columnar format

Sizing

  • STG_BWH_0000_0000_F001_F_M_qv_bw_qry_cpeco202_0001 all months 145000 records
  • prj-data-industrial-dash-dev.DM.DIM_pure  174 records

Assessment

Data same as BW query (Cerise)

Data same as Gsheet (Pure)

Loading

1.1 Incremental Load

Source to ODS = N/A

1.2 Full load

Source to ODS 

  • F001_SUS_CeriseAfter_STG_to_ODS control by 
    • l_VAR_XTRACT_PARA_TALEND_PRD_QV_BW_QRY_CPECO202_0001 = blank will full load

1.3. Reloading data

Source to ODS 

Cerise : Just run the flow task again. 

1.4 Plan to schedule

It is scheduled below on WS_DATA_OCEAN_DOMAIN_SUSTAINABILITY

  • F001_QV_BW_QRY_CPECO202_0001_to_ODS (detail job using reference job) schedule on task daily weekday at 3:30 CET

It is scheduled below on WS_IND_DASHBOARD

  • F001_BQ_query_to_fact_gHg  schedule on task daily weekday at 12:00pm CET


1.5 Timing

Cerise = 5  minutes from source to ODS

Pure = 5 minutes from source to DM

Criticality

Low?

Logging

Check the loading records / error on prj-data-dm-sust-[env]

select job.job_name, job.meta_start_date, job.meta_execution_id, logs.meta_run_id, logs.meta_source_system, logs.meta_step, logs.meta_status, logs.meta_num_lines, logs.meta_error_lines from STG.log_tables logs join STG.run_jobs job on logs.meta_run_id = job.meta_run_id
where  logs.meta_run_id in (SELECT meta_run_id FROM STG.run_jobs order by meta_start_date desc limit 1000)
and job_name = 'F001_QV_BW_QRY_CPECO202_0001_to_ODS'
and meta_start_date >  DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
order by job.meta_start_date desc