...
Architecture - High Level Design (HLD)
Link.
Architecture - Low Level Design (LLD)
Link.
Architecture Data Flow
| Google Drive Live Link | ||
|---|---|---|
|
|
|
|
DataPrep Flow
Schema showing the different STEPS of the application flow - with the data involved at each step
Steps descriptions on Big Query
Describe the data and process involved at each step
SAP PF1
Description
This project has the data from SAP on QM, material document and Vendor master data tables loading into Industrial data ocean for all data in PF1 basing on the create and modification date. Then, the view on top will filter only Soda Ash data.
Note: Soda ash data has only in PF1, therefore, this project will load only from PF1. (It is not include WP1)
Tools
Talend:
1. Loading
The data will keep in
Bucket = cs-ew1-prj-data-dm-industrial-[dev]-staging and it will keep in each folder for each table
GCP project =prj-data-dm-industrial-[env]
1.1 Incremental Load
The main job is F100_SPF_IND_QM_Main to run all the incremental load, which include the loading of table MCH1, MCHA, QALS, AFVC, QAMR, QAMV, QAVE, MSEG, PLPO in sequence in order to limit the number of background job in SAP.
...
STG_SPF_0000_0000_F001_I_H_afvc
...
STG_SPF_0000_0000_F001_I_H_mch1
...
STG_SPF_0000_0000_F001_I_H_mcha
...
STG_SPF_0000_0000_F001_I_H_mseg
...
STG_SPF_0000_0000_F001_I_H_plpo
...
STG_SPF_0000_0000_F001_I_H_qals
...
STG_SPF_0000_0000_F001_I_H_qamr
...
STG_SPF_0000_0000_F001_I_H_qamv
...
STG_SPF_0000_0000_F001_I_H_qave
...
The job AFVC is required to run after QALS because the job will select the list of AUFPL (Routing number of operations in the order) from last load of QALS (max meta_business_date). In case of reload, it will rely on QALS table as well.
1.2 Full Load
Job F101_SPF_IND_QM_Main_Full will manage the full load job.
...
STG_SPF_0000_0000_F001_F_H_lfa1
...
2. Reloading data
All the job will have context parameter l_VAR_eBatch_PF1_[TableName]_additional_filter to change the selection when extract the SAP. For incremental load, this context MUST BE "incremental". If it is blank, it will get data from 2023.
The incremental load is based on the following field
Table | Name of Table | Incremental load by | |
MCH1 | Batches (if Batch Management Cross-Plant) | ERSDA - Create date | LAEDA - Modify date |
MCHA | Batches | ERSDA- Create date | LAEDA - Modify date |
QALS | Inspection lot record (header) | ERSTELDAT- Create date | AENDERDAT - Modify date |
QAMR | Characteristic results during inspection processing | ERSTELLDAT- Create date | AENDERDAT - Modify date |
QAMV | Characteristic specifications for inspection processing (Average / Summary result) | ERSTELLDAT- Create date | AENDERDAT - Modify date |
QAVE | Inspection processing: Usage decision | VDATUM- Create date | VAEDATUM - Modify date |
MSEG | Document Segment: Material | CPUDT_MKPF- Create date | /BEV2/ED_AEDAT - Modify date |
PLPO | Task list - operation/activity | ANDAT - Create date | AEDAT - Modify date |
AFVC | Operation within an order | Get list of AUFPL from last load of QALS | |
LFA1 | Vendor Master (General Section) | FULL | |
Note: the incremental will check only date not the time.
Example of context reload
l_VAR_eBatch_PF1_AFVC_additional_filter = AUFPL >= '1007995974' and AUFPL <= '1009027526'
l_VAR_eBatch_PF1_MCH1_additional_filter = ERSDA > '20230101'
Access rights
To access to SAP, the Talend user RFC_TAL_PF1 is required
Source
Format
Destination
Location
DataOean GCP = prj-data-dm-industrial-dev
Product GCP = prj-data-sad-ebatch-ppd
To save the data into GCP Industrial Data Ocean, service account is required
Format
Same as the source
Sizing
Expected data volume for :
- full process on LFA1 table and the rest are incremental process
Assessment
How to validate that the generated output is valid: Compare with table in PF1
Scheduling
It is schedule by plan in TMC:
PL_INDUS_EBATCH_LOAD (incremental load) At the moment in dev, it is loading daily at 8:00 CET.
PL_INDUS_EBATCH_LOAD_FULL (full load, only LFA1 table) At the moment in dev, it is not scheduled
Timing
The average time expected for :
- full process - Depend
- incremental process : 5 - 10 min
Criticality
High / Medium / Low ??
Logging
1. Table log: This sql will get the status log that have job name from source system SPF (PF1)
...
2. Incremental table: Every time that the job loading complete, max time from staging table will update this table. Therefore, it is easier to know which table is not updated.
Table: STG.incremental_loading
PI Startrek
...
Children Display




