The data source getting from data ocean sustainability, which get from 2 BW queries from CERISE to combine with PURE data (it should get from Pure server but at the moment, get from gsheet)

Xtract job = TALEND_[ENV]_QV_BW_QRY_CPECO202_0001, which filter on authorization scope = ECO and input period on FIscal year period in format YYYYMMM to YYYYMMM

1- Creates a temporary table by pivoting data from the Cerise and Pure Dimension tables.
2- Truncates the current information in the Fact table.
3- Inserts the pivoted data that was created in the temporary table and loads it into Fact.
4- Drop the temporary table that was created in the first step.
Table from Pure:
Dimension table which receive as input the Gsheet:
Dataflow
Cerise: prj-data-dm-sust-dev.ODS.VB_BWH_qlikview_afterco2_cerise → prj-data-dm-sust-dev.DS_prj_data_industrial_dash.V_BWH_qlikview_afterco2_cerise→ prj-data-industrial-dash-dev.DataOcean.V_BWH_qlikview_afterco2_cerise
Pure: prj-data-industrial-dash-dev.DM.DIM_pure → prj-data-industrial-dash-dev.DM.FACT_ghg→ prj-data-industrial-dash-dev.DPL.V_FACT_ghg
It is required to access BW query via Xtract.
Project GCP:
| BW Query | Xtract Job |
|---|---|
QV_BW_QRY_CPECO202_0001 | TALEND_PRD_QV_BW_QRY_CPECO202_0001 |
columnar format
columnar format
Data same as BW query (Cerise)
Data same as Gsheet (Pure)
Source to ODS = N/A
Source to ODS
Source to ODS
Cerise : Just run the flow task again.
It is scheduled below on WS_DATA_OCEAN_DOMAIN_SUSTAINABILITY
It is scheduled below on WS_IND_DASHBOARD
Cerise = 5 minutes from source to ODS
Pure = 5 minutes from source to DM
Low?
Check the loading records / error on prj-data-dm-sust-[env]
select job.job_name, job.meta_start_date, job.meta_execution_id, logs.meta_run_id, logs.meta_source_system, logs.meta_step, logs.meta_status, logs.meta_num_lines, logs.meta_error_lines from STG.log_tables logs join STG.run_jobs job on logs.meta_run_id = job.meta_run_id
where logs.meta_run_id in (SELECT meta_run_id FROM STG.run_jobs order by meta_start_date desc limit 1000)
and job_name = 'F001_QV_BW_QRY_CPECO202_0001_to_ODS'
and meta_start_date > DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
order by job.meta_start_date desc