Schema showing the different STEPS of the application flow - with the data involved at each step
Describe the data and process involved at each step
Changes made in the data transformation step since October 23, 2023:
In Python Recipe: Calc_out_to_CHC → C0142ASolvay_SajaXAINVRIO is now feed from: GN_IOT_LT00139.PV tag from MES Server
- In Python Recipe: compute_MES_raw_water_prep → C0143ASolvay_HispavicXAINCANL is now feed from: GN_IOT_LT00137.PV tag from MES Server. In this particular value there is also an update. In the old version of production data there is a -1.42 adjustment. So that the value is not corrected anymore. We remove (-1.42) is actual version (request WO0000000483037).
ETL
Dataiku
Dataiku sFTP connections are defined.
Web scraping doesn't require any access validation.→ Since October 23, 2023, all the data comes directly from MES. So Webscraping step is therefore no longer applicable.
MES service account is required. Service account used in CL/FL Torrelavega can be used for this project.
Multiple check are done on the input datasets:
- Existence of data on the website: error handling is defined to ensure we always have the data.
- Data from sFTP is not validated, as provided by external company.
- Existence of data from sFTP: validated. If no data is received for more than 24 hours, the email notification is sent to the external company.
Process steps is fully automated, a date argument is automatically fetched day by day. No configuration is required.
Data is coming from the MES and sFTP directly. Everything is computed in Dataiku.
Aspentech sql, csv files from sFTP and csv/html from webscraping.
sFTP server and google spreadhsheet.
csv and a table.
2.46 KB per file. 1 file for every 1 hour.
Extraction is done every hour. Similarly, reporting is done when the transformation is finished.
Full process doesn't exist. Incremental process takes around 1.5 minutes.
High
Dataiku
What does it do ?
ETL
Dataiku
None
Existence of data on the website: error handling is defined to ensure we always have the data.
Process steps is fully automated
csv/html from webscraping.
google spreadhsheet.
pandas table
River level: 671 data point, 15.9+ KB
Extraction is done every 15 minutes.
20 seconds
High
Dataiku