High Level Project Architecture

Here is a the link of the data architecture schema.


Architecture - High Level Design (HLD)

Link.


Architecture - Low Level Design (LLD)

Link.


Architecture Data Flow


DataPrep Flow

Schema showing the different STEPS of the application flow - with the data involved at each step

Steps descriptions

Describe the data and process involved at each step

SAP PF1

Description

This project has the data from SAP on QM, material document and Vendor master data tables loading into Industrial data ocean for all data in PF1 basing on the create and modification date. Then, the view on top will filter only Soda Ash data.

Note: Soda ash data has only in PF1, therefore, this project will load only from PF1. (It is not include WP1)

Tools

Talend: 

1. Loading

The data will keep in

Bucket = cs-ew1-prj-data-dm-industrial-[dev]-staging and it will keep in each folder for each table

GCP project =prj-data-dm-industrial-[env]

1.1 Incremental Load

The main job is F100_SPF_IND_QM_Main to run all the incremental load, which include the loading of table MCH1, MCHA, QALS, AFVC, QAMR, QAMV, QAVE, MSEG, PLPO in sequence in order to limit the number of background job in SAP. 

JobSTGODSComment
F001_SPF_F001_I_H_AFVC_TO_BQ

STG_SPF_0000_0000_F001_I_H_afvc

ODS_SPF_0000_F001_I_H_afvc
F001_SPF_F001_I_H_MCH1_TO_BQ

STG_SPF_0000_0000_F001_I_H_mch1

ODS_SPF_0000_F001_I_H_mch1
F001_SPF_F001_I_H_MCHA_TO_BQ

STG_SPF_0000_0000_F001_I_H_mcha

ODS_SPF_0000_F001_I_H_mcha
F001_SPF_F001_I_H_MSEG_TO_BQ

STG_SPF_0000_0000_F001_I_H_mseg

ODS_SPF_0000_F001_I_H_mseg
F001_SPF_F001_I_H_PLPO_TO_BQ

STG_SPF_0000_0000_F001_I_H_plpo

ODS_SPF_0000_F001_I_H_plpoIt may not need.
F001_SPF_F001_I_H_QALS_TO_BQ

STG_SPF_0000_0000_F001_I_H_qals

ODS_SPF_0000_F001_I_H_qals
F001_SPF_F001_I_H_QAMR_TO_BQ

STG_SPF_0000_0000_F001_I_H_qamr

ODS_SPF_0000_F001_I_H_qamr
F001_SPF_F001_I_H_QAMV_TO_BQ

STG_SPF_0000_0000_F001_I_H_qamv

ODS_SPF_0000_F001_I_H_qamv
F001_SPF_F001_I_H_QAVE_TO_BQ

STG_SPF_0000_0000_F001_I_H_qave

ODS_SPF_0000_F001_I_H_qave

The job AFVC is required to run after QALS because the job will select the list of AUFPL (Routing number of operations in the order) from last load of QALS (max meta_business_date).  In case of reload, it will rely on QALS table as well. 

1.2 Full Load

Job F101_SPF_IND_QM_Main_Full will manage the full load job. 

JobSTGODSComment
F001_SPF_F001_F_D_LFA1_TO_BQ

STG_SPF_0000_0000_F001_F_H_lfa1

ODS_SPF_0000_F001_F_H_lfa1


2. Reloading data

All the job will have context parameter l_VAR_eBatch_PF1_[TableName]_additional_filter to change the selection when extract the SAP. For incremental load, this context MUST BE "incremental". If it is blank, it will get data from 2023. 

The incremental load is based on the following field

Table

Name of Table

Incremental load by 

MCH1

Batches (if Batch Management Cross-Plant)

ERSDA - Create date

LAEDA - Modify date

MCHA

Batches

ERSDA- Create date

LAEDA - Modify date

QALS

Inspection lot record (header)

ERSTELDAT- Create date

AENDERDAT - Modify date

QAMR

Characteristic results during inspection processing

ERSTELLDAT- Create date

AENDERDAT - Modify date

QAMV

Characteristic specifications for inspection processing

(Average / Summary result)

ERSTELLDAT- Create date

AENDERDAT - Modify date

QAVE

Inspection processing: Usage decision

VDATUM- Create date

VAEDATUM - Modify date

MSEG

Document Segment: Material

CPUDT_MKPF- Create date

/BEV2/ED_AEDAT - Modify date

PLPO

Task list - operation/activity

ANDAT - Create date

AEDAT - Modify date

AFVC

Operation within an order

Get list of AUFPL from last load of QALS

LFA1

Vendor Master (General Section) 

FULL

Note: the incremental will check only date not the time. 

Example of context reload

l_VAR_eBatch_PF1_AFVC_additional_filter = AUFPL >= '1007995974' and AUFPL <= '1009027526'

l_VAR_eBatch_PF1_MCH1_additional_filter = ERSDA  > '20230101'


Access rights

To access to SAP, the Talend user RFC_TAL_PF1 is required

Source

pf1nonha.eua.solvay.com

Format

The format of the source data

Destination

Location

DataOean GCP = prj-data-dm-industrial-dev

Product GCP = prj-data-sad-ebatch-ppd

To save the data into GCP Industrial Data Ocean, service account is required

Format

Same as the source

Sizing

Expected data volume for :

Assessment

How to validate that the generated output is valid: Compare with table in PF1

Scheduling

It is schedule by plan in TMC:

PL_INDUS_EBATCH_LOAD (incremental load) At the moment in dev, it is loading daily at 8:00 CET.

PL_INDUS_EBATCH_LOAD_FULL (full load, only LFA1 table) At the moment in dev, it is not scheduled

Timing

The average time expected for :

Criticality

High / Medium / Low ??

Logging

1. Table log: This sql will get the status log that have job name from source system SPF (PF1)

select job.job_name, job.meta_start_date, logs.meta_run_id, logs.meta_source_system, logs.meta_step, logs.meta_status, logs.meta_num_lines, logs.meta_error_lines from STG.log_tables logs join STG.run_jobs job on logs.meta_run_id = job.meta_run_id
where  logs.meta_run_id in (SELECT meta_run_id FROM STG.run_jobs order by meta_start_date desc limit 100)
and job_name like '%SPF%'
and meta_start_date >  DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
order by job.meta_start_date desc

2. Incremental table:  Every time that the job loading complete, max time from staging table will update this table. Therefore, it is easier to know which table is not updated. 

Table: STG.incremental_loading


PI Startrek

SAME QUESTIONS