Description

The data source getting from prj-data-dm-industrial, which get from prj-data-dm-industrial-test.EDC.V_EDC_monthly_inventory_performance_data_finance_dio

tables in order to get the information of DIO.

Tools: Talend

Talend project DATA_OCEAN_DOMAIN_INDUSTRIAL

F001_GSheet_site_to_BQ_DM_DIO

From source to ODS

Job:

F001_GSheet_site_to_BQ_DM_DIO

(DIO)

Data, Analytics & AI Platform > DIO > image-2025-3-17_23-0-13.png

Define variable and Generate meta_run_id
Call reference job to load from EDC.V_EDC_monthly_inventory_performance_data_finance_dio` and save the output file to bucket cs-ew1-prj-data-dm-industrial-test-staging/dio
Load from Bucket to STG and ODS
Update log

From ODS to DM

Data, Analytics & AI Platform > DIO > image-2025-3-17_23-1-26.png

Load context from the RDS to be used in talend
Load data from ODS to DM

From DM to Operational Dashboard

Dataflow

prj-data-dm-industrial-test.EDC.V_EDC_monthly_inventory_performance_data_finance_dio → prj-data-dm-industrial-test.STG.STG_FIL_0000_0000_F001_F_M_dio → prj-data-dm-industrial-test.ODS.ODS_FIL_0000_F001_F_M_dio → prj-data-industrial-dash-test.ODS_DataOcean.V_ODS_dio → prj-data-industrial-dash-test.DM.FACT_dio → prj-data-industrial-dash-test.DPL.V_FACT_dio

Access rights

It is required to access prj-data-industrial-dash-test project and prj-data-dm-industrial-test project.

Source

BigQuery

Project = prj-data-dm-industrial-tes

BQ Dataset = EDC

BQ View = V_EDC_monthly_inventory_performance_data_finance_dio

Destination

DataOcean

Bucket = cs-ew1-prj-data-dm-industrial-test-staging/
- dio
- FIL_IND_0000_0000_F001_20250314122333_0000_F_M_dio.csv
STG Table names = STG_FIL_0000_0000_F001_F_M_dio

ODS Table names = ODS_FIL_0000_F001_F_M_dio

Product

GCP = prj-data-industrial-dash-[env]
DataOcean
- V_FACT_dio

Format

columnar format

Sizing

STG_FIL_0000_0000_F001_F_M_fc around 17,392 records

Loading

1.1 Full load

Dio Source to ODS : PL_DM_IND_Dio

1.2. Reloading data

Dio

Just run the PL_DM_IND_Dio .

1.3 Plan to schedule

It is scheduled by plans below on WS_DATA_OCEAN_DOMAIN_INDUSTRIAL

PL_DM_IND_Dio - Every 30 minuntes and start at 06:00am UTC until 06:00pm

1.4 Timing

Case 5 minutes from source to ODS (full)

1.5 Pre Requisites

3 tables are created from a Gsheet.

A Scheduled Query is then executed in BigQuery, which basically pivots various information from the tables and transforms them into a tabular view. From there, the input source is created to be consumed in Talend.

For more details, access the documentation below:

M2P Indus Dash - Documentos Google

Scheduled Query Explanation

Overview

This SQL script automates the validation and materialization of multiple datasets in BigQuery . It follows a structured approach to:

Validate the source data – Ensuring the dataset is accessible and contains valid data.
Drop and recreate materialized tables – Removing outdated tables and creating new ones with additional metadata.
Enhance traceability – Adding metadata fields such as row identifiers, timestamps, and Google Sheets references.

The script processes three datasets separately:

Wave Project Star (wave_project_star)
Wave Activity Star (wave_activity_star)
Monthly Inventory Performance - Data Finance (monthly_inventory_performance_data_finance)

Each dataset follows the same process, ensuring reliability and error handling.

Criticality

Low?

Logging

Check the loading records

select job.job_name , job.meta_start_date , job.meta_execution_id , logs.meta_run_id , logs.meta_source_system , logs.meta_step , logs.meta_status , logs.meta_num_lines , logs.meta_error_lines from STG.log_tables logs join STG.run_jobs job on logs.meta_run_id = job.meta_run_id
where job_name like '%DIO%'
order by job.meta_start_date desc