Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Architecture - High Level Design (HLD)

Link.


Architecture - Low Level Design (LLD)

Link.


Architecture Data Flow


Google Drive Live Link
urlhttps://
docs
drive.google.com/
spreadsheets
file/d/
1gXHCRA03T5pH3wC--erjuT8DjyxIlhEZthyeCCNZXgI/edit#gid=1684414561
1274-2h5sL4Cyjsot-UgMYo03i2B8Q_hHzOv-Z3ixglo/view

DataPrep Flow

Schema showing the different STEPS of the application flow - with the data involved at each step

Steps descriptions on Big Query

Describe the data and process involved at each step

SAP PF1

Description

This project has the data from SAP on QM, material document and Vendor master data tables loading into Industrial data ocean for all data in PF1 basing on the create and modification date. Then, the view on top will filter only Soda Ash data.

Note: Soda ash data has only in PF1, therefore, this project will load only from PF1. (It is not include WP1)

Tools

Talend: 

1. Loading

1.1 Incremental Load

The main job is F100_SPF_IND_QM_Main to run all the incremental load, which include the loading of table MCH1, MCHA, QALS, AFVC, QAMR, QAMV, QAVE, MSEG, PLPO in sequence in order to limit the number of background job in SAP. 

Image Removed

...

MCH1

...

STG_SPF_0000_0000_F001_I_H_mch1

...

ERSDA - Create date

...

LAEDA - Modify date

...

MCHA

...

STG_SPF_0000_0000_F001_I_H_mcha

...

ERSDA- Create date

...

LAEDA - Modify date

...

MSEG

...

STG_SPF_0000_0000_F001_I_H_mseg

...

CPUDT_MKPF- Create date

...

/BEV2/ED_AEDAT - Modify date

...

QALS

...

STG_SPF_0000_0000_F001_I_H_qals

...

ERSTELDAT- Create date

...

AENDERDAT - Modify date

...

QAMR

...

STG_SPF_0000_0000_F001_I_H_qamr

...

ERSTELLDAT- Create date

...

AENDERDAT - Modify date

...

QAMV

...

STG_SPF_0000_0000_F001_I_H_qamv

...

ERSTELLDAT- Create date

...

AENDERDAT - Modify date

...

STG_SPF_0000_0000_F001_I_H_qasr

...

ERSTELLDAT- Create date

...

AENDERDAT - Modify date

...

QAVE

...

STG_SPF_0000_0000_F001_I_H_qave

...

VDATUM- Create date

...

VAEDATUM - Modify date

...

AFVC

...

STG_SPF_0000_0000_F001_I_H_afvc

...

Get list of AUFPL from last load of QALS

The job AFVC is required to run after QALS because the job will select the list of AUFPL (Routing number of operations in the order) from last load of QALS (max meta_business_date).  In case of reload, it will rely on QALS table as well. 

1.2 Full Load

Job F101_SPF_IND_QM_Main_Full will manage the full load job. 

...

LFA1

...

STG_SPF_0000_0000_F001_F_H_lfa1

...

QPAC

...

STG_SPF_0000_0000_F001_F_H_qpac

...

QPAM

...

STG_SPF_0000_0000_F001_F_H_qpam

...

QPCD

...

STG_SPF_0000_0000_F001_F_H_qpcd

...

QPCT

...

STG_SPF_0000_0000_F001_F_H_qpct

...

2. Reloading data

All the job will have context parameter l_VAR_eBatch_PF1_[TableName]_additional_filter to change the selection when extract the SAP. For incremental load, this context MUST BE "incremental". If it is blank, it will get data from 2023. 

Note: the incremental will check only date not the time. 

Example of context reload

l_VAR_eBatch_PF1_AFVC_additional_filter = AUFPL >= '1007995974' and AUFPL <= '1009027526'

l_VAR_eBatch_PF1_MCH1_additional_filter = ERSDA  > '20230101'

Access rights

To access to SAP, the Talend user RFC_TAL_PF1 is required

Source

           pf1nonha.eua.solvay.com

Format

The format of the source data

Destination

Location

The data will keep in

Bucket = cs-ew1-prj-data-dm-industrial-[dev]-staging and it will keep in each folder for each table

Image Removed

DataOean GCP = prj-data-dm-industrial-[env]

Product GCP = prj-data-sad-ebatch-[env]

To save the data into GCP Industrial Data Ocean, service account is required

Format

Same as the source

Sizing

Expected data volume for full load (as of May 2024)

Table

Table Name

Size

LFA1

Vendor Master (General Section)

229,553

QPAC

Inspection catalog codes for selected sets

2,314

QPAM

Inspection catalog selected sets

741

QPCD

Inspection catalog codes

26,828

QPCT

Code texts

228,149

The rest will be incremental hourly 

Image Removed

Assessment

How to validate that the generated output is valid: Compare with table in PF1

Scheduling

PL_INDUS_EBATCH_SPF_QM_INC_LOAD (incremental load) run every 1 hour weekday. 

PL_INDUS_EBATCH_SPF_QM_FULL_LOAD (full load) run every workday at 08:00 AM CET (daily)

Timing

The average time expected for :

  • full process - 7 - 15 min
  • incremental process : 5 - 10 min

Criticality

High

Logging

Industrial

1. Table log: This sql will get the status log that have job name from source system SPF (PF1)

...

2. Incremental table:  Every time that the job loading complete, max time from staging table will update this table. Therefore, it is easier to know which table is not updated. 

Table: STG.incremental_loading

Image Removed

SAP BW HR

Description: BW HR query is used to determine the user authorization basing on the user profile on site.  The source is query DO_BW_QRY_CPHRPANHR_0001 and it is manage by Xtract job TALEND_PROD_DO_BW_QRY_CPHRPANHR_0001 on Xtract server

ACEW1DXTRAXUS01. There is only 1 parameter, which is YYYYMM_Start=202403&YYYYMM_End=202403. It is required to enter start and end

Talend: 

1. Loading

1.1 Full Load

...

2. Reloading data

The job will have context parameter l_VAR_XTRACT_PARA_CPHRPANHR to change the selection of the BW query. Normally the value should be "currentmonth" in order to load only the current month. However, if the data from other months need to be reload, we can enter the context to be  this context MUST BE "&YYYYMM_Start=202403&YYYYMM_End=202403"  In this example, it will load only Mar 2024. If it is blank, it will get all the data in BW, which may cause the job error with memory.

Access rights

To access to BW, the Talend will connect to Xtract and Xtract require authorization of user RFC_TAL_WBP. 

Most of the case for HR query, it is required to contact authorization team to add the authorization. In this case, role ZH_PA_ROBOT is modified in order to add the new query.

Source

          BW query DO_BW_QRY_CPHRPANHR_0001

Format

Data Dict

Destination

Location

The data will keep in

Bucket = cs-ew1-prj-data-dm-hr-[dev]-staging 

DataOean GCP = prj-data-dm-hr-dev

Product GCP = prj-data-sad-ebatch-ppd

To save the data into GCP Industrial Data Ocean, service account is required

Format

Same as the source

Sizing

        Current month should have around 30,000 records 

Assessment

How to validate that the generated output is valid: Compare with BW query DO_BW_QRY_CPHRPANHR_0001 and drill down all characteristics same as Data Dict

Scheduling

PL_HR_EBATCH_BWH_CPHRPANHR_0001 run every Monday at 08:00 AM CET (weekly)

Timing

          take time around 2 - 5 minutes loading.

Criticality

High / Medium / Low ??

Logging

...

PI Startrek

Description: it is a API connection to get manufacturing data.  It is required authenticate from windows user, which Talend does not have the function to do. Therefore, it is required Python to run.

...

There are 3 main API that we use in the eBatch project

1. Event Frame with calculation

1.1 URL: https://pivision.eua.solvay.com/piwebapi/assetdatabases/F1RD0WI4NHGJw0uG4fwriJz93welqtozWz30OpJS7GtHMFHgQUNFVzFQU1RFS1BBRjAzXEdCVSBTT0RBIC0gQUxMIFBMQU5UUw/eventframes?templateName=Batch_Record_Pharma

Option:

&starttime=*-24h    (last 24 hours)

&startime=%272024-05-19%2010:40:00%27  (specific start time)

It will list all event frames in the selected time. 

Image Removed

1.2 Each event frame is required to get the list of Attributes in the link of attribute. We can specific only the fields value that we need by using parameter "selectedFields"

https://pivision.eua.solvay.com/piwebapi/eventframes/F1Fm0WI4NHGJw0uG4fwriJz93w7OER3RAm7xGECgKK4iH3qQQUNFVzFQU1RFS1BBRjAzXEdCVSBTT0RBIC0gQUxMIFBMQU5UU1xFVkVOVEZSQU1FU1tFQkFUQ0ggRVZFTlQgRlJBTUUgR0VORVJBVElPTjIgMjAyNC0wNi0wOSAwMzozMTozOC40MDBd/attributes?selectedFields=Items.Name;Items.Links.Attributes;Items.Links.Value

Image Removed

We will get the value directly from Batch_ID, Weight and Silo by getting from Value links (point 1.4)

Sometime we will get error batch id but we keep it as the same as source

Image Removed

1.3 The rest of the attribute, which is not starting with Granulom and not equals to Batch_ID, Weight and Silo are required to access to Attribute link in order to get sub attribute (Avg, Low_limit, Upper_limit, Max, Mean, Min, Std)

Image Removed

1.4 Get the value from the Value link

Image Removed

2. Event Frame with value in time series - Granulom

The same steps as Event Frame with calculation on point 1 and 2 but the Attribute will be get only the name starting with "Granulom". Then instead of get the sub attribute, it is required to get the values during the event frame in "RecordedData" link

Image Removed

3. Tags data

3.1 Find the "Web ID" by using this URL: https://pivision.eua.solvay.com/piwebapi/points/search?dataServerWebId=F1DSUZOarN2bd0yKBpfAHpVUXAQUNFVzFQU1RFS1BEQTAx&query=TOR.BIR.SILO.SILO.SILO.CB_M_BIR_Calcio_BIR_2&selectedFields=Items.WebId

The red text is the tag name in this file List of Tag (column N). For example, in this project for Torrelavega, we will get only 5 tags, which control by variable l_VAR_eBatch_tag_TOR_tag

  • TOR.BIR.SILO.SILO.SILO.SY_L2_PPM_CA_BIR           = F1DPUZOarN2bd0yKBpfAHpVUXAraMBAAQUNFVzFQU1RFS1BEQTAxXFRPUi5CSVIuU0lMTy5TSUxPLlNJTE8uU1lfTDJfUFBNX0NBX0JJUg
  • TOR.BIR.SILO.SILO.SILO.SY_L2_COL_L                     = F1DPUZOarN2bd0yKBpfAHpVUXArqMBAAQUNFVzFQU1RFS1BEQTAxXFRPUi5CSVIuU0lMTy5TSUxPLlNJTE8uU1lfTDJfQ09MX0w
  • TOR.BIR.SILO.SILO.SILO.CB_M_BIR_Colorimetría_a = F1DPUZOarN2bd0yKBpfAHpVUXAyKMBAAQUNFVzFQU1RFS1BEQTAxXFRPUi5CSVIuU0lMTy5TSUxPLlNJTE8uQ0JfTV9CSVJfQ09MT1JJTUVUUsONQV9B
  • TOR.BIR.SILO.SILO.SILO.CB_M_BIR_Colorimetría_b = F1DPUZOarN2bd0yKBpfAHpVUXAyaMBAAQUNFVzFQU1RFS1BEQTAxXFRPUi5CSVIuU0lMTy5TSUxPLlNJTE8uQ0JfTV9CSVJfQ09MT1JJTUVUUsONQV9C
  • TOR.BIR.SILO.SILO.SILO.CB_M_BIR_Colorimetría_L = F1DPUZOarN2bd0yKBpfAHpVUXAyqMBAAQUNFVzFQU1RFS1BEQTAxXFRPUi5CSVIuU0lMTy5TSUxPLlNJTE8uQ0JfTV9CSVJfQ09MT1JJTUVUUsONQV9M

Example output of the link will be 

Image Removed

3.2 Get the value by using this URL: https://pivision.eua.solvay.com/piwebapi/streams/[WebId]/recorded?starttime=2024-06-07%2000:26:52

Image Removed

Python

Example file of python file

Image Removed

  1. It is the main URL to call in order to get another url 
  2. Output from run 1 on web (required authorization to get the result)
  3. Selected link will be called to get the result
  4. Call the selected link URL  + the new parameter  on 
  5. Example of the full URL 
  6. It is the result of the second call.  The field can be select by parameter "selectedFields=Items.Timestamp;Items.Value"

Talend

1. Job Detail

Image Removed

  1. tFileDelete: Delete the log file from the python name DATA\[env]\IND\IND_EBATCH\py_log.txt.  All the jobs use the same file name and it will delete before. 
  2. tJava: To generate .py file in DATA\[env]\IND\IND_EBATCH\pyCodeyyyyMMddHHmmss.py by reading input parameter from the Flow job and the RSD parameter. 
  3. tSystem: To run the py file.
    1. The output of the screen will be kept in DATA\[env]\IND\IND_EBATCH\py_log.txt.
    2. The extraction will be generated in DATA\[env]\IND\IND_EBATCH\Output\[API interface]\file.csv
    3. Image Removed
  4. tGSPut:  To upload csv file to GCS prj-data-sad-ebatch-[env]/cs-ew1-prj-data-sad-ebatch-[env]-staging/[API interface]
  5. tFileDelete: Delete csv files
  6. tFileDelete: Delete py file
2. Incremental Load (Flow job)

Image Removed

2.1. Get the last load from table STG.incremental_loading

Image Removed

2.2. Run the job detail that generate py file and run the python to get the output

2.3. Standard job to load to STG and ODS

2.4. Update the last load from the event frame start date and tag timestamp to the STG.incremental_loading

Remark: PI_TOR_TAG has many tags. Therefore, it will get the minimum date of the max loading date.

3. Job summary
4. Reloading data

It is flexible to do the reload by selected starttime on table STG.incremental_load

Access rights

To access to PI, it is required david.genet@solvay.com to grant the authorization to access the pi (window account - SVC_TALEND_ADMIN_DEV ) first in order to run the python (authenticate by window account). If promote, it is required to ask authorization.

Source

https://pivision-dev.eua.solvay.com/piwebapi/

Format

Json web API

Destination

Location

Bucket = cs-ew1-prj-data-sad-ebatch-[env]-staging and it will keep in each folder for each table

DataOean GCP = N/A

Product GCP = prj-data-sad-ebatch-ppd

Plan job is PL_EBATCH_PI_TO_ODS

Main job is F100_PI_TOR_TO_BQ

...

Format

Event Frame with calculation
Event Frame with value in time series - Granulom
Tags data

Sizing

Less than 100 records / hour

Assessment

    How to validate that the generated output is valid: Compare with the output from the web API

Scheduling

    PL_EBATCH_PI_TO_ODS run every hour on Mon - Fri

Timing

    1 event frame will take around 2 - 4 min

    For tags, it is very quick

Criticality

High 

Logging

    1. Monitor number of records within a day

...

     We should not have 0 record.  If it is 0 normally, it mean that python is error. (if python is error, the log will be OK in Talend log)

    2. Monitor last update date. The result should be the same current date. 

...

Children Display