Electronic Batch - 3 - Technical - Data Pipeline

High Level Project Architecture

Here is a the link of the data architecture schema.

Architecture - High Level Design (HLD)

Link.

Architecture - Low Level Design (LLD)

Link.

Architecture Data Flow

DataPrep Flow

Schema showing the different STEPS of the application flow - with the data involved at each step

Steps descriptions

Describe the data and process involved at each step

SAP PF1

Description

This project has the data from SAP on QM, material document and Vendor master data tables loading into Industrial data ocean for all data in PF1 basing on the create and modification date. Then, the view on top will filter only Soda Ash data.

Note: Soda ash data has only in PF1, therefore, this project will load only from PF1. (It is not include WP1)

Tools

Talend:
1. Loading
1.1 Incremental Load
The main job is F100_SPF_IND_QM_Main to run all the incremental load, which include the loading of table MCH1, MCHA, QALS, AFVC, QAMR, QAMV, QAVE, MSEG, PLPO in sequence in order to limit the number of background job in SAP.
Job STG ODS Comment
F001_SPF_F001_I_H_AFVC_TO_BQ
STG_SPF_0000_0000_F001_I_H_afvc
ODS_SPF_0000_F001_I_H_afvc
F001_SPF_F001_I_H_MCH1_TO_BQ
STG_SPF_0000_0000_F001_I_H_mch1
ODS_SPF_0000_F001_I_H_mch1
F001_SPF_F001_I_H_MCHA_TO_BQ
STG_SPF_0000_0000_F001_I_H_mcha
ODS_SPF_0000_F001_I_H_mcha
F001_SPF_F001_I_H_MSEG_TO_BQ
STG_SPF_0000_0000_F001_I_H_mseg
ODS_SPF_0000_F001_I_H_mseg
F001_SPF_F001_I_H_PLPO_TO_BQ
STG_SPF_0000_0000_F001_I_H_plpo
ODS_SPF_0000_F001_I_H_plpo It may not need.
F001_SPF_F001_I_H_QALS_TO_BQ
STG_SPF_0000_0000_F001_I_H_qals
ODS_SPF_0000_F001_I_H_qals
F001_SPF_F001_I_H_QAMR_TO_BQ
STG_SPF_0000_0000_F001_I_H_qamr
ODS_SPF_0000_F001_I_H_qamr
F001_SPF_F001_I_H_QAMV_TO_BQ
STG_SPF_0000_0000_F001_I_H_qamv
ODS_SPF_0000_F001_I_H_qamv
F001_SPF_F001_I_H_QAVE_TO_BQ
STG_SPF_0000_0000_F001_I_H_qave
ODS_SPF_0000_F001_I_H_qave
The job AFVC is required to run after QALS because the job will select the list of AUFPL (Routing number of operations in the order) from last load of QALS (max meta_business_date). In case of reload, it will rely on QALS table as well.
1.2 Full Load
Job F101_SPF_IND_QM_Main_Full will manage the full load job.
Job STG ODS Comment
F001_SPF_F001_F_D_LFA1_TO_BQ
STG_SPF_0000_0000_F001_F_H_lfa1
ODS_SPF_0000_F001_F_H_lfa1

2. Reloading data
All the job will have context parameter l_VAR_eBatch_PF1_[TableName]_additional_filter to change the selection when extract the SAP. For incremental load, this context MUST BE "incremental". If it is blank, it will get data from 2023.
The incremental load is based on the following field
Table
Name of Table
Incremental load by
MCH1
Batches (if Batch Management Cross-Plant)
ERSDA - Create date
LAEDA - Modify date
MCHA
Batches
ERSDA- Create date
LAEDA - Modify date
QALS
Inspection lot record (header)
ERSTELDAT- Create date
AENDERDAT - Modify date
QAMR
Characteristic results during inspection processing
ERSTELLDAT- Create date
AENDERDAT - Modify date
QAMV
Characteristic specifications for inspection processing
(Average / Summary result)
ERSTELLDAT- Create date
AENDERDAT - Modify date
QAVE
Inspection processing: Usage decision
VDATUM- Create date
VAEDATUM - Modify date
MSEG
Document Segment: Material
CPUDT_MKPF- Create date
/BEV2/ED_AEDAT - Modify date
PLPO
Task list - operation/activity
ANDAT - Create date
AEDAT - Modify date
AFVC
Operation within an order
Get list of AUFPL from last load of QALS
LFA1
Vendor Master (General Section)
FULL
Note: the incremental will check only date not the time.
Example of context reload
l_VAR_eBatch_PF1_AFVC_additional_filter = AUFPL >= '1007995974' and AUFPL <= '1009027526'
l_VAR_eBatch_PF1_MCH1_additional_filter = ERSDA > '20230101'

Access rights

To access to SAP, the Talend user RFC_TAL_PF1 is required

Source

pf1nonha.eua.solvay.com

Format

The format of the source data

Destination

Location

The data will keep in
Bucket = cs-ew1-prj-data-dm-industrial-[dev]-staging and it will keep in each folder for each table
DataOean GCP = prj-data-dm-industrial-[env]
Product GCP = prj-data-sad-ebatch-[env]
To save the data into GCP Industrial Data Ocean, service account is required

Format

Same as the source

Sizing

Expected data volume for :
full process on LFA1 table and the rest are incremental process

Assessment

How to validate that the generated output is valid: Compare with table in PF1

Scheduling

It is schedule by plan in TMC:
PL_INDUS_EBATCH_LOAD (incremental load) At the moment in dev, it is loading daily at 8:00 CET.
PL_INDUS_EBATCH_LOAD_FULL (full load, only LFA1 table) At the moment in dev, it is not scheduled

Timing

The average time expected for :
full process - Depend
incremental process : 5 - 10 min

Criticality

High / Medium / Low ??

Logging

1. Table log: This sql will get the status log that have job name from source system SPF (PF1)
select job.job_name, job.meta_start_date, logs.meta_run_id, logs.meta_source_system, logs.meta_step, logs.meta_status, logs.meta_num_lines, logs.meta_error_lines from STG.log_tables logs join STG.run_jobs job on logs.meta_run_id = job.meta_run_id
where logs.meta_run_id in (SELECT meta_run_id FROM STG.run_jobs order by meta_start_date desc limit 100)
and job_name like '%SPF%'
and meta_start_date > DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
order by job.meta_start_date desc
2. Incremental table: Every time that the job loading complete, max time from staging table will update this table. Therefore, it is easier to know which table is not updated.
Table: STG.incremental_loading

SAP BW HR

Description: BW HR query is used to determine the user authorization basing on the user profile on site. The source is query DO_BW_QRY_CPHRPANHR_0001 and it is manage by Xtract job TALEND_PROD_DO_BW_QRY_CPHRPANHR_0001. There is only 1 parameter, which is YYYYMM_Start=202403&YYYYMM_End=202403. It is required to enter start and end
Talend:
1. Loading
1.1 Full Load
Job STG ODS Comment
F028_BW_QRY_CPHRPANHR_0001_to_ODS STG_BWH_0000_0000_F001_F_D_qry_cphrpanhr_0001 ODS_BWH_0000_F001_F_D_qry_cphrpanhr_0001
2. Reloading data
The job will have context parameter l_VAR_XTRACT_PARA_CPHRPANHR to change the selection of the BW query. Normally the value should be "currentmonth" in order to load only the current month. However, if the data from other months need to be reload, we can enter the context to be this context MUST BE "&YYYYMM_Start=202403&YYYYMM_End=202403" In this example, it will load only Mar 2024. If it is blank, it will get all the data in BW, which may cause the job error with memory.

Access rights

To access to BW, the Talend will connect to Xtract and Xtract require authorization of user RFC_TAL_WBP.
Most of the case for HR query, it is required to contact authorization team to add the authorization. In this case, role ZH_PA_ROBOT is modified in order to add the new query.

Source

BW query DO_BW_QRY_CPHRPANHR_0001

Format

Data Dict

Destination

Location

The data will keep in
Bucket = cs-ew1-prj-data-dm-hr-[dev]-staging
DataOean GCP = prj-data-dm-hr-dev
Product GCP = prj-data-sad-ebatch-ppd
To save the data into GCP Industrial Data Ocean, service account is required

Format

Same as the source

Sizing

Current month should have around 30,000 records

Assessment

How to validate that the generated output is valid: Compare with BW query DO_BW_QRY_CPHRPANHR_0001 and drill down all characteristics same as Data Dict

Scheduling

TBD

Timing

take time around 2 - 5 minutes loading.

Criticality

High / Medium / Low ??

Logging

select job.job_name, job.meta_start_date, logs.meta_run_id, logs.meta_source_system, logs.meta_step, logs.meta_status, logs.meta_num_lines, logs.meta_error_lines from STG.log_tables logs join STG.run_jobs job on logs.meta_run_id = job.meta_run_id
where logs.meta_run_id in (SELECT meta_run_id FROM STG.run_jobs order by meta_start_date desc limit 100)
and meta_source_system like '%BWH%CPHRPANHR%'
and meta_start_date > DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
order by job.meta_start_date desc

PI Startrek

Description: it is a API connection to get manufacturing data. It is reuqired authenticate from windows user, which Talend does not have the function to do. Therefore, it is required Python to run.

Python:
The .py file must to keep in \\Acew1dtlndeng02\data\DEV\DATA_OCEAN_DOMAIN_INDUSTRIAL\Python
Example file is ExtractTag.py
This can adapt to other layout.
It is the main URL to call in order to get another url
Output from run 1 on web (required authorizaiton to get the result)
We need to pass this value to get the result
Get the parent content Link.RecordedData URL + the new parameter on the resource
It is the new URL to call again
It is the result of the second call. The field can be select by parameter "selectedFields=Items.Timestamp;Items.Value"
Talend:
1. Loading
tSystem component to run the python
tJavaFlex to create a file from the result of .py, which will be the json file (tag.json)
Read the json tag
Get the data to a table / file and follow the process
1.1 Full Load
It is during the test now since the structure to get the PI is not yet done
Job STG ODS Comment
F200_TEST_PYTHON

2. Reloading data

Access rights

To access to PI, it is required david.genet@solvay.com to grant the authorization to access the pi (window account - SVC_TALEND_ADMIN_DEV ) first in order to run the python (authenticate by window account). If promote, it is required to ask authorization.

Source

https://pivision-dev.eua.solvay.com/piwebapi/

Format

Not yet define

Destination

Location

Bucket = cs-ew1-prj-data-dm-industrial-[dev]-staging and it will keep in each folder for each table
DataOean GCP = prj-data-dm-industrial-[env]
Product GCP = prj-data-sad-ebatch-ppd
To save the data into GCP Industrial Data Ocean, service account is required

Job	STG	ODS	Comment
F001_SPF_F001_I_H_AFVC_TO_BQ	STG_SPF_0000_0000_F001_I_H_afvc	ODS_SPF_0000_F001_I_H_afvc
F001_SPF_F001_I_H_MCH1_TO_BQ	STG_SPF_0000_0000_F001_I_H_mch1	ODS_SPF_0000_F001_I_H_mch1
F001_SPF_F001_I_H_MCHA_TO_BQ	STG_SPF_0000_0000_F001_I_H_mcha	ODS_SPF_0000_F001_I_H_mcha
F001_SPF_F001_I_H_MSEG_TO_BQ	STG_SPF_0000_0000_F001_I_H_mseg	ODS_SPF_0000_F001_I_H_mseg
F001_SPF_F001_I_H_PLPO_TO_BQ	STG_SPF_0000_0000_F001_I_H_plpo	ODS_SPF_0000_F001_I_H_plpo	It may not need.
F001_SPF_F001_I_H_QALS_TO_BQ	STG_SPF_0000_0000_F001_I_H_qals	ODS_SPF_0000_F001_I_H_qals
F001_SPF_F001_I_H_QAMR_TO_BQ	STG_SPF_0000_0000_F001_I_H_qamr	ODS_SPF_0000_F001_I_H_qamr
F001_SPF_F001_I_H_QAMV_TO_BQ	STG_SPF_0000_0000_F001_I_H_qamv	ODS_SPF_0000_F001_I_H_qamv
F001_SPF_F001_I_H_QAVE_TO_BQ	STG_SPF_0000_0000_F001_I_H_qave	ODS_SPF_0000_F001_I_H_qave

Table	Name of Table	Incremental load by
MCH1	Batches (if Batch Management Cross-Plant)	ERSDA - Create date	LAEDA - Modify date
MCHA	Batches	ERSDA- Create date	LAEDA - Modify date
QALS	Inspection lot record (header)	ERSTELDAT- Create date	AENDERDAT - Modify date
QAMR	Characteristic results during inspection processing	ERSTELLDAT- Create date	AENDERDAT - Modify date
QAMV	Characteristic specifications for inspection processing (Average / Summary result)	ERSTELLDAT- Create date	AENDERDAT - Modify date
QAVE	Inspection processing: Usage decision	VDATUM- Create date	VAEDATUM - Modify date
MSEG	Document Segment: Material	CPUDT_MKPF- Create date	/BEV2/ED_AEDAT - Modify date
PLPO	Task list - operation/activity	ANDAT - Create date	AEDAT - Modify date
AFVC	Operation within an order	Get list of AUFPL from last load of QALS
LFA1	Vendor Master (General Section)	FULL

Page tree

Electronic Batch - 3 - Technical - Data Pipeline

High Level Project Architecture

Architecture - High Level Design (HLD)

Architecture - Low Level Design (LLD)

Architecture Data Flow

DataPrep Flow

Steps descriptions

SAP PF1

Description

Tools

Talend:

The incremental load is based on the following field

Access rights

Source

Format

Destination

Location

Format

Sizing

Assessment

Scheduling

Timing

Criticality

Logging

SAP BW HR

Talend:

Access rights

Source

Format

Destination

Location

Format

Sizing

Assessment

Scheduling

Timing

Criticality

Logging

PI Startrek

Description: it is a API connection to get manufacturing data. It is reuqired authenticate from windows user, which Talend does not have the function to do. Therefore, it is required Python to run.

Python:

Talend:

Access rights

Source

Format

Destination

Location

Format

Sizing

Assessment

Scheduling

Timing

Criticality

Logging