Description

The data source getting from data ocean sustainability, which get from 2 BW queries from CERISE to combine with PURE data (it should get from Pure server but at the moment, get from gsheet)

Tools: Talend

From source to ODS 

Talend project DATA_OCEAN_DOMAIN_SUSTAINABILITY (Cerise)

Xtract job = TALEND_[ENV]_QV_BW_QRY_CPECO202_0001, which filter on authorization scope = ECO and input period on FIscal year period in format YYYYMMM to YYYYMMM


Talend project IND_DASHBOARD

1- Creates a temporary table by pivoting data from the Cerise and Pure Dimension tables.

2- Truncates the current information in the Fact table.

3- Inserts the pivoted data that was created in the temporary table and loads it into Fact.

4- Drop the temporary table that was created in the first step.

prj-data-industrial-dash-[environment] (PURE)

Table from Pure:

Dimension table which receive as input the Gsheet:

Dataflow

Cerise: prj-data-dm-sust-dev.ODS.VB_BWH_qlikview_afterco2_cerise → prj-data-dm-sust-dev.DS_prj_data_industrial_dash.V_BWH_qlikview_afterco2_cerise→ prj-data-industrial-dash-dev.DataOcean.V_BWH_qlikview_afterco2_cerise

Pure: prj-data-industrial-dash-dev.DM.DIM_pure → prj-data-industrial-dash-dev.DM.FACT_ghg→ prj-data-industrial-dash-dev.DPL.V_FACT_ghg

Access rights

It is required to access BW query via Xtract.  

Project GCP: 

  1. prj-data-industrial-dash-[env]
  2. prj-data-dm-sust-[env]

Source

BW QueryXtract Job

QV_BW_QRY_CPECO202_0001

TALEND_PRD_QV_BW_QRY_CPECO202_0001

Format

columnar format

Destination

DataOcean

Product

Format

columnar format

Sizing

Assessment

Data same as BW query (Cerise)

Data same as Gsheet (Pure)

Loading

1.1 Incremental Load

Source to ODS = N/A

1.2 Full load

Source to ODS 

1.3. Reloading data

Source to ODS 

Cerise : Just run the flow task again. 

1.4 Plan to schedule

It is scheduled below on WS_DATA_OCEAN_DOMAIN_SUSTAINABILITY

It is scheduled below on WS_IND_DASHBOARD


1.5 Timing

Cerise = 5  minutes from source to ODS

Pure = 5 minutes from source to DM

Criticality

Low?

Logging

Check the loading records / error on prj-data-dm-sust-[env]

select job.job_name, job.meta_start_date, job.meta_execution_id, logs.meta_run_id, logs.meta_source_system, logs.meta_step, logs.meta_status, logs.meta_num_lines, logs.meta_error_lines from STG.log_tables logs join STG.run_jobs job on logs.meta_run_id = job.meta_run_id
where  logs.meta_run_id in (SELECT meta_run_id FROM STG.run_jobs order by meta_start_date desc limit 1000)
and job_name = 'F001_QV_BW_QRY_CPECO202_0001_to_ODS'
and meta_start_date >  DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
order by job.meta_start_date desc