Description

The data source getting from data ocean sustainability, which get from 2 BW queries from CERISE to combine with PURE data (it should get from Pure server but at the moment, get from gsheet)

Tools: Talend

From source to ODS

Talend project DATA_OCEAN_DOMAIN_SUSTAINABILITY (Cerise)

F001_QV_BW_QRY_CPECO202_0001_to_ODS (detail job using reference job)

Xtract job = TALEND_[ENV]_QV_BW_QRY_CPECO202_0001, which filter on authorization scope = ECO and input period on FIscal year period in format YYYYMMM to YYYYMMM

Talend project IND_DASHBOARD

F001_BQ_query_to_fact_gHg

1- Creates a temporary table by pivoting data from the Cerise and Pure Dimension tables.

2- Truncates the current information in the Fact table.

3- Inserts the pivoted data that was created in the temporary table and loads it into Fact.

4- Drop the temporary table that was created in the first step.

prj-data-industrial-dash-[environment] (PURE)

Table from Pure:

prj-data-industrial-dash-dev.DM.DIM_pure

Dimension table which receive as input the Gsheet:

2024-06-05 Non CO2 GHG - Planilhas Google

Dataflow

Cerise: prj-data-dm-sust-dev.ODS.VB_BWH_qlikview_afterco2_cerise → prj-data-dm-sust-dev.DS_prj_data_industrial_dash.V_BWH_qlikview_afterco2_cerise→ prj-data-industrial-dash-dev.DataOcean.V_BWH_qlikview_afterco2_cerise

Pure: prj-data-industrial-dash-dev.DM.DIM_pure → prj-data-industrial-dash-dev.DM.FACT_ghg→ prj-data-industrial-dash-dev.DPL.V_FACT_ghg

Access rights

It is required to access BW query via Xtract.

Project GCP:

prj-data-industrial-dash-[env]
prj-data-dm-sust-[env]

Source

BW Query	Xtract Job
QV_BW_QRY_CPECO202_0001	TALEND_PRD_QV_BW_QRY_CPECO202_0001

Format

columnar format

Destination

DataOcean

Bucket = cs-ew1-prj-data-dm-sust-[env]-staging/BWCERISE/
DataOean GCP = prj-data-dm-sust-[env]
STG Table names =
- STG_BWH_0000_0000_F001_F_M_qv_bw_qry_cpeco202_0001

ODS Table names =
- ODS_BWH_0000_F001_F_M_qv_bw_qry_cpeco202_0001
- V_ODS_BWH_0000_F001_F_M_qv_bw_qry_cpeco202_0001
- VB_BWH_qlikview_afterco2_cerise

Product

GCP = prj-data-industrial-dash-[env]
DM
- FACT_ghg
DataOcean
- V_BWH_qlikview_afterco2_cerise
DPL
- V_FACT_ghg

Format

columnar format

Sizing

STG_BWH_0000_0000_F001_F_M_qv_bw_qry_cpeco202_0001 all months 145000 records
prj-data-industrial-dash-dev.DM.DIM_pure 174 records

Assessment

Data same as BW query (Cerise)

Data same as Gsheet (Pure)

Loading

1.1 Incremental Load

Source to ODS = N/A

1.2 Full load

Source to ODS

F001_SUS_CeriseAfter_STG_to_ODS control by
- l_VAR_XTRACT_PARA_TALEND_PRD_QV_BW_QRY_CPECO202_0001 = blank will full load

1.3. Reloading data

Source to ODS

Cerise : Just run the flow task again.

1.4 Plan to schedule

It is scheduled below on WS_DATA_OCEAN_DOMAIN_SUSTAINABILITY

F001_QV_BW_QRY_CPECO202_0001_to_ODS (detail job using reference job) schedule on task daily weekday at 3:30 CET

It is scheduled below on WS_IND_DASHBOARD

F001_BQ_query_to_fact_gHg schedule on task daily weekday at 12:00pm CET

1.5 Timing

Cerise = 5 minutes from source to ODS

Pure = 5 minutes from source to DM

Criticality

Low?

Logging

Check the loading records / error on prj-data-dm-sust-[env]

select job.job_name, job.meta_start_date, job.meta_execution_id, logs.meta_run_id, logs.meta_source_system, logs.meta_step, logs.meta_status, logs.meta_num_lines, logs.meta_error_lines from STG.log_tables logs join STG.run_jobs job on logs.meta_run_id = job.meta_run_id
where logs.meta_run_id in (SELECT meta_run_id FROM STG.run_jobs order by meta_start_date desc limit 1000)
and job_name = 'F001_QV_BW_QRY_CPECO202_0001_to_ODS'
and meta_start_date > DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
order by job.meta_start_date desc

Page tree

Products

Services

Ways of Working

Meet the Team

Technologies

Help

Life in DA&AI

Description

Tools: Talend

From source to ODS

Talend project DATA_OCEAN_DOMAIN_SUSTAINABILITY (Cerise)

Talend project IND_DASHBOARD

prj-data-industrial-dash-[environment] (PURE)

Access rights

Source

Format

Destination

DataOcean

Product

Format

Sizing

Assessment

Loading

1.1 Incremental Load

1.2 Full load

1.3. Reloading data

1.4 Plan to schedule

1.5 Timing

Criticality

Logging

Page tree

Products

Services

Ways of Working

Meet the Team

Technologies

Help

Life in DA&AI

GHG

Description

Tools: Talend

From source to ODS

Talend project DATA_OCEAN_DOMAIN_SUSTAINABILITY (Cerise)

Talend project IND_DASHBOARD

prj-data-industrial-dash-[environment] (PURE)

Access rights

Source

Format

Destination

DataOcean

Product

Format

Sizing

Assessment

Loading

1.1 Incremental Load

1.2 Full load

1.3. Reloading data

1.4 Plan to schedule

1.5 Timing

Criticality

Logging