Page tree


Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The data source getting from prj-data-dm-industrial, which get from prj-data-dm-industrial-test.EDC.V_EDC_monthly_inventory_performance_data_finance_dio

...

  • F001_GSheet_site_to_BQ_DM_DIO

(DIO)

Image RemovedImage Added

  1. Define variable and Generate meta_run_id
  2. Call reference job to load from EDC.V_EDC_monthly_inventory_performance_data_finance_dio` and save the output file to bucket cs-ew1-prj-data-dm-industrial-test-staging/dio
  3. Load from Bucket to STG and ODS
  4. Update log

From ODS to DM

Image Added

  1. Load context from the RDS to be used in talend
  2. Load Move data from ODS to DM
  3. Update log

From DM to Operational Dashboard

...

Access rights

It is required to access prj-data-industrial-dash-test project and prj-data-dm-industrial-test project.

...

1.1 Full load

1.2. Reloading data

Dio

1.3 Plan to schedule

It is scheduled by plans below on WS_DATA_OCEAN_DOMAIN_INDUSTRIAL

  • TASKPL_DM_IND_DAILY_Dio - Every 30 minuntes and start at 06:00am UTC until 06:00pm

...

  • Case 5 minutes from source to ODS (full)

1.5 Pre Requisites

3 tables are created from a Gsheet.

A Scheduled Query is then executed in BigQuery, which basically pivots various information from the tables and transforms them into a tabular view. From there, the input source is created to be consumed in Talend.

For more details, access the documentation below:

Scheduled Query Explanation


Overview

This SQL script automates the validation and materialization of multiple datasets in BigQuery . It follows a structured approach to:

  1. Validate the source data – Ensuring the dataset is accessible and contains valid data.
  2. Drop and recreate materialized tables – Removing outdated tables and creating new ones with additional metadata.
  3. Enhance traceability – Adding metadata fields such as row identifiers, timestamps, and Google Sheets references.

The script processes three datasets separately:

  • Wave Project Star (wave_project_star)
  • Wave Activity Star (wave_activity_star)
  • Monthly Inventory Performance - Data Finance (monthly_inventory_performance_data_finance)

Each dataset follows the same process, ensuring reliability and error handling.

Criticality

  • Low?

Logging

Check the loading records 

select job.job_name , job.meta_start_date , job.meta_execution_id , logs.meta_run_id , logs.meta_source_system , logs.meta_step , logs.meta_status , logs.meta_num_lines , logs.meta_error_lines from STG.log_tables logs join STG.run_jobs job on logs.meta_run_id = job.meta_run_id
where  logs.meta_run_idin(SELECTmeta_run_idFROMSTG.run_jobsorderbymeta_start_datedesclimit1000)
and job_namein('F001_GSheet_site_to_BQ_DM_DIO') like '%DIO%'
order by job.meta_start_date desc

...