Description

It is a API connection to get manufacturing data. It is required authenticate from windows user, which Talend does not have the function to do. Therefore, it is required Python to run.

The python environment is setup in folder \data\[Env]\DATA_OCEAN_DOMAIN_INDUSTRIAL\Python

There are 3 main API that we use in the eBatch project

1. Event Frame with calculation

1.1 URL : https://pivision.eua.solvay.com/piwebapi/assetdatabases/F1RD0WI4NHGJw0uG4fwriJz93welqtozWz30OpJS7GtHMFHgQUNFVzFQU1RFS1BBRjAzXEdCVSBTT0RBIC0gQUxMIFBMQU5UUw/eventframes?templateName=Batch_Record_Pharma

Option:

&starttime=*-24h (last 24 hours)

&startime=%272024-05-19%2010:40:00%27 (specific start time)

It will list all event frames in the selected time.

1.2 Each event frame is required to get the list of Attributes in the link of attribute. We can specific only the fields value that we need by using parameter "selectedFields"

https://pivision.eua.solvay.com/piwebapi/eventframes/F1Fm0WI4NHGJw0uG4fwriJz93w7OER3RAm7xGECgKK4iH3qQQUNFVzFQU1RFS1BBRjAzXEdCVSBTT0RBIC0gQUxMIFBMQU5UU1xFVkVOVEZSQU1FU1tFQkFUQ0ggRVZFTlQgRlJBTUUgR0VORVJBVElPTjIgMjAyNC0wNi0wOSAwMzozMTozOC40MDBd/attributes? selectedFields=Items.Name;Items.Links.Attributes;Items.Links.Value

We will get the value directly from Batch_ID, Weight and Silo by getting from Value links (point 1.4)

Sometime we will get error batch id but we keep it as the same as source

1.3 The rest of the attribute, which is not starting with Granulom and not equals to Batch_ID, Weight and Silo are required to access to Attribute link in order to get sub attribute (Avg, Low_limit, Upper_limit, Max, Mean, Min, Std)

1.4 Get the value from the Value link

2. Event Frame with value in time series - Granulom

The same steps as Event Frame with calculation on point 1 and 2 but the Attribute will be get only the name starting with "Granulom". Then instead of get the sub attribute, it is required to get the values during the event frame in "RecordedData" link

3. Tags data

3.1 Find the "Web ID" by using this URL: https://pivision.eua.solvay.com/piwebapi/points/search?dataServerWebId=F1DSUZOarN2bd0yKBpfAHpVUXAQUNFVzFQU1RFS1BEQTAx&query=TOR.BIR.SILO.SILO.SILO.SY_L2_COL_A&selectedFields=Items.WebId

The red text is the tag name in this file List of Tag (column N). For example Torrelavega, we add the tags into this context in RDS l_VAR_eBatch_tag_TOR_tag ={'TOR.BIR.SILO.SILO.SILO.SY_L2_COL_A','TOR.BIR.SILO.SILO.SILO.SY_L2_COL_B','TOR.BIR.SILO.SILO.SILO.SY_L2_COL_L','TOR.BIR.SILO.SILO.SILO.SY_L2_PPM_CA_BIR'}

Remark: each site will have different parameter. TOR is the site name Torrelavega.

Example output of the link will be

3.2 Get the value by using this URL: https://pivision.eua.solvay.com/piwebapi/streams/ [WebId] /recorded?starttime= 2024-06-07%2000:26:52

Python

Example file of python file

It is the main URL to call in order to get another url
Output from run 1 on web (required authorization to get the result)
Selected link will be called to get the result
Call the selected link URL + the new parameter on
Example of the full URL
It is the result of the second call. The field can be select by parameter "selectedFields=Items.Timestamp;Items.Value"

Talend

Project IND_EBATCH

The URL of API is setup on context below for all the sites

l_VAR_eBatch_endpoint

l_VAR_eBatch_piwebapi_url

l_VAR_eBatch_tag_dataServer

Site specific will be setup on context such as TOR is Torrelavega

l_VAR_eBatch_eventFrame_TOR_assetDB for event frame

l_VAR_eBatch_tag_TOR_tag for tag value

1. Job Detail

tFileDelete: Delete the log file from the python name DATA\[env]\IND\IND_EBATCH\py_log.txt. All the jobs use the same file name and it will delete before. The deletion will be on EventFrame job only. The other will not delete. Therefore, we can see all the logs each load.
tJava: To generate .py file in DATA\[env]\IND\IND_EBATCH\pyCodeyyyyMMddHHmmss.py by reading input parameter from the Flow job and the RSD parameter. Example of python code
tSystem: To run the py file.
1. The output of the screen will be kept in DATA\[env]\IND\IND_EBATCH\py_log.txt.
2. The extraction will be generated in DATA\[env]\IND\IND_EBATCH\Output\[API interface]\file.csv
tGSPut: To upload csv file to GCS prj-data-sad-ebatch-[env]/cs-ew1-prj-data-sad-ebatch-[env]-staging/[API interface]
tFileDelete: Delete csv files
tFileDelete: Delete py file when the context l_VAR_eBatch_py_debug = no

2. Flow job

2.1. Get the last load from table STG.incremental_loading

2.2. Run the job detail that generate py file and run the python to get the output

2.3. Standard job to load to STG and ODS

2.4. Update the last load from the event frame start date and tag timestamp to the STG.incremental_loading

Remark: PI_TOR_TAG has many tags. Therefore, it will get the minimum date of the max loading date.

3. Access rights

To access to PI, it is required david.genet@solvay.com to grant the authorization to access the pi (window account - SVC_TALEND_ADMIN_DEV ) first in order to run the python (authenticate by window account).
In DEV and TEST, we can use function HttpNegotiateAuth() to identify the access without provide user name and password, it will automatically identify the user SVC_TALEND_ADMIN_DEV. However, in PREPROD and PROD, the configuration is different. Therefore, we need to use function HttpNtlmAuth( 'EUA \\ user', 'password' ) instead. This can be setup in context l_VAR_eBatch_piwebapi_auth in RDS

4. Source

https://pivision-dev.eua.solvay.com/piwebapi/

5. Format

Json web API

Destination

Location

Bucket = cs-ew1-prj-data-sad-ebatch-[env]-staging and it will keep in each folder for each table
DataOean GCP = N/A
Product GCP = prj-data-sad-ebatch-ppd
Plan job is PL_EBATCH_PI_TO_ODS
Main job is F100_PI_TOR_TO_BQ

Detail Job
STG
ODS
F001_PI_TOR_EFRAME_TO_BQ STG_EXT_TOR_0000_F001_I_H_pi_eframe ODS_EXT_TOR_F001_I_H_pi_eframe
F002_PI_TOR_EFRAME_GRANULOM_TO_BQ STG_EXT_TOR_0000_F001_I_H_pi_eframe_granulom ODS_EXT_TOR_F001_I_H_pi_eframe_granulom
F010_PI_TOR_TAG_TO_BQ STG_EXT_TOR_0000_F001_I_H_pi_tag ODS_EXT_TOR_F001_I_H_pi_tag

Format

Event Frame with calculation
Event Frame with value in time series - Granulom
Tags data

Sizing

Less than 100 records / hour

Assessment

How to validate that the generated output is valid: Compare with the output from the web API

EventFrame(calculation and Granulom) = https://pivision.eua.solvay.com/piwebapi/assetdatabases/F1RD0WI4NHGJw0uG4fwriJz93welqtozWz30OpJS7GtHMFHgQUNFVzFQU1RFS1BBRjAzXEdCVSBTT0RBIC0gQUxMIFBMQU5UUw/eventframes?templateName=Batch_Record_Pharma&[start time]

start time='2024-04-19 19:40:00'

Tags Data
- Web ID = https://pivision.eua.solvay.com/piwebapi/points/search?dataServerWebId=F1DSUZOarN2bd0yKBpfAHpVUXAQUNFVzFQU1RFS1BEQTAx&query=[Tag_Name]&selectedFields=Items.WebId
- Tags Data = https://pivision.eua.solvay.com/piwebapi/streams/[web id ]/recorded?[start time]

start time=' 2024-06-07 00:26:52'

Loading

1. Incremental Load

The main job is F100_PI_TOR_TO_BQ to trigger loading of 3 PI data sources by using table STG.incremental_load as the last successful loading to use this time to send to URL parameter to get the start time from this point

The step 2.4 of the Flow job, it will use job T101 for Tag and Event Frame while T103_UPDATE_LAST_EXTRACT_FROM_TAGS. This job will be a special select sql because the extraction will get many tags at the same time and the start time is not the same for each tags. Therefore, we need to select minimum start time among the tags.

meta_file_name is setup by context l_VAR_eBatch_tag_TOR_load. If this name change, the l_VAR_eBatch_tag_TOR_STG and l_VAR_eBatch_tag_TOR_ODS should be changed as well in case want to save the data into another tables with the new incremental load flag. Normally these 3 contexts should not be changed.

2. Reloading data

It is flexible to do the reload by selected start time on table STG.incremental_load on project prj-data-sad-ebatch-dev

update `prj-data-sad-ebatch-dev.STG.incremental_loading`
set meta_last_process_date = '2024-11-01 00:00:00'
where meta_file_name = 'PI_TOR_EFRAME' -- or PI_TOR_EFRAME_GRANULOM, PI_TOR_TAG,

also it is possible to specific the end time of reloading on RDS "l_VAR_eBatch_eventFrame_TOR_endtime" like 2024-05-29%2003:00:00. Normally this context variable should be empty

3. Scheduling

PL_EBATCH_PI_TO_ODS run every hour every day including weekend.

4. Timing

An event frame will take around 2 - 4 min

For tags, it is very quick

Criticality

High

Logging

1. Monitor number of records within a day

select job.job_name , job.meta_start_date , logs.meta_run_id , logs.meta_source_system , logs.meta_step , logs.meta_status , logs.meta_num_lines , logs.meta_error_lines from STG.log_tables logs join STG.run_jobs job on logs.meta_run_id = job.meta_run_id
where logs.meta_run_id in ( SELECT meta_run_id FROM STG.run_jobs order by meta_start_date desc limit 100 )
and job_name like '%PI%'
and meta_step = 'Bucket to Staging'
and meta_start_date > DATE_SUB ( CURRENT_TIMESTAMP () , INTERVAL 1 DAY )
order by job.meta_start_date desc

1. 1 We should NOT have 0 record. If it is 0 normally, it mean that python is error. (please check the python log at \data\[env]\IND\IND_EBATCH\py_log.txt)

1.2 If it is required to check the py file, context l_VAR_eBatch_py_debug in RDS has to change from no to yes. Then the py file will not be deleted after run the program.

2. Monitor last update date. The result should be the same current date.

SELECT * FROM STG.incremental_loading
where meta_file_name like '%PI%'
order by meta_file_name

Detail Job	STG	ODS
F001_PI_TOR_EFRAME_TO_BQ	STG_EXT_TOR_0000_F001_I_H_pi_eframe	ODS_EXT_TOR_F001_I_H_pi_eframe
F002_PI_TOR_EFRAME_GRANULOM_TO_BQ	STG_EXT_TOR_0000_F001_I_H_pi_eframe_granulom	ODS_EXT_TOR_F001_I_H_pi_eframe_granulom
F010_PI_TOR_TAG_TO_BQ	STG_EXT_TOR_0000_F001_I_H_pi_tag	ODS_EXT_TOR_F001_I_H_pi_tag

Page tree

PI Startek