Description
The data source getting from data ocean sustainability, which get from 2 BW queries from CERISE to combine with PURE data (it should get from Pure server but at the moment, get from gsheet)
Tools: Talend
From source to ODS
Talend project DATA_OCEAN_DOMAIN_SUSTAINABILITY (Cerise)
- F001_QV_BW_QRY_CPECO202_0001_to_ODS (detail job using reference job)
Xtract job = TALEND_[ENV]_QV_BW_QRY_CPECO202_0001, which filter on authorization scope = ECO and input period on FIscal year period in format YYYYMMM to YYYYMMM
Talend project IND_DASHBOARD
- F001_BQ_query_to_fact_gHg
1- Creates a temporary table by pivoting data from the Cerise and Pure Dimension tables.
2- Truncates the current information in the Fact table.
3- Inserts the pivoted data that was created in the temporary table and loads it into Fact.
4- Drop the temporary table that was created in the first step.
prj-data-industrial-dash-[environment] (PURE)
Table from Pure:
- prj-data-industrial-dash-dev.DM.DIM_pure
Dimension table which receive as input the Gsheet:
Dataflow
Cerise: prj-data-dm-sust-dev.ODS.VB_BWH_qlikview_afterco2_cerise → prj-data-dm-sust-dev.DS_prj_data_industrial_dash.V_BWH_qlikview_afterco2_cerise→ prj-data-industrial-dash-dev.DataOcean.V_BWH_qlikview_afterco2_cerise
Pure: prj-data-industrial-dash-dev.DM.DIM_pure → prj-data-industrial-dash-dev.DM.FACT_ghg→ prj-data-industrial-dash-dev.DPL.V_FACT_ghg
Access rights
It is required to access BW query via Xtract.
Project GCP:
- prj-data-industrial-dash-[env]
- prj-data-dm-sust-[env]
Source
| BW Query | Xtract Job |
|---|---|
QV_BW_QRY_CPECO202_0001 | TALEND_PRD_QV_BW_QRY_CPECO202_0001 |
Format
columnar format
Destination
DataOcean
- Bucket = cs-ew1-prj-data-dm-sust-[env]-staging/BWCERISE/
- DataOean GCP = prj-data-dm-sust-[env]
- STG Table names =
- STG_BWH_0000_0000_F001_F_M_qv_bw_qry_cpeco202_0001
- ODS Table names =
- ODS_BWH_0000_F001_F_M_qv_bw_qry_cpeco202_0001
- V_ODS_BWH_0000_F001_F_M_qv_bw_qry_cpeco202_0001
- VB_BWH_qlikview_afterco2_cerise
Product
- GCP = prj-data-industrial-dash-[env]
- DM
- FACT_ghg
- DataOcean
- V_BWH_qlikview_afterco2_cerise
- DPL
- V_FACT_ghg
Format
columnar format
Sizing
- STG_BWH_0000_0000_F001_F_M_qv_bw_qry_cpeco202_0001 all months 145000 records
- prj-data-industrial-dash-dev.DM.DIM_pure 174 records
Assessment
Data same as BW query (Cerise)
Data same as Gsheet (Pure)
Loading
1.1 Incremental Load
Source to ODS = N/A
1.2 Full load
Source to ODS
- F001_SUS_CeriseAfter_STG_to_ODS control by
- l_VAR_XTRACT_PARA_TALEND_PRD_QV_BW_QRY_CPECO202_0001 = blank will full load
1.3. Reloading data
Source to ODS
Cerise : Just run the flow task again.
1.4 Plan to schedule
It is scheduled below on WS_DATA_OCEAN_DOMAIN_SUSTAINABILITY
- F001_QV_BW_QRY_CPECO202_0001_to_ODS (detail job using reference job) schedule on task daily weekday at 3:30 CET
It is scheduled below on WS_IND_DASHBOARD
- F001_BQ_query_to_fact_gHg schedule on task daily weekday at 12:00pm CET
1.5 Timing
Cerise = 5 minutes from source to ODS
Pure = 5 minutes from source to DM
Criticality
Low?
Logging
Check the loading records / error on prj-data-dm-sust-[env]
select job.job_name, job.meta_start_date, job.meta_execution_id, logs.meta_run_id, logs.meta_source_system, logs.meta_step, logs.meta_status, logs.meta_num_lines, logs.meta_error_lines from STG.log_tables logs join STG.run_jobs job on logs.meta_run_id = job.meta_run_id
where logs.meta_run_id in (SELECT meta_run_id FROM STG.run_jobs order by meta_start_date desc limit 1000)
and job_name = 'F001_QV_BW_QRY_CPECO202_0001_to_ODS'
and meta_start_date > DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
order by job.meta_start_date desc

