Talend architecture is an hybrid platform between AWS, GCP and solvay onPremise machines. The platform is not designed to take into account confidential or sensitive project such as EAR, ITAR, GDPR topics.
The Remote Engines
There are for the moment (August 2022) 9 remote engines running as following :
- 4 remote engines on AWS (DEV / UAT / PRE-PROD / PROD) => all machines on AWS are using windows server 2016 datacenter edition
- 4 remote engines on GCP (DEV / UAT / PRE-PROD / PROD) => all machines on GCP are using ubuntu 20.04 LTS edition
- 1 remote engines on EHWA (Korea) Solvay plant (PROD) => the machine is running under windows server 2016 standard edition, normally to be aligned with the rest of the architecture we should get as well 3 more machines to build DEV / UAT and PRE-PROD but due to talend license contrainst there is a maximum of 10 remote engines available. The last remote engine installation available is kept for other incoming use case not defined yet ...
Initially the old platform was only running on windows machines but according to internal solvay policy, it is asked to run as much as possible on Linux machines. However due to technical / policy constrainst it is not possible to get windows machine within GCP Cloud provider that's why all windows machines are located within AWS. Windows machine is required for Industrial talend project which use specific and proprietary driver to connect on MES (OSISOFT PI) system which is only availble for windows machine. Linux machines are set on GCP because 95% of the target systems use cases for data loading is located in GCP cloud provider.
The remote engine located on Korea server is for a very specific use case to retrieve Korea (EHWA) RnI Battery cycler data.
To sum-up having remote engines located on AWS, GCP and onPremise should help to anticipate further use cases depending the source/target systems location.
The RDS Database
The RDS databases are required in order to avoid having talend context variable (for example connection string to data source system, credendtials, reference date and so on) hardcoded within talend project and to ease the run of talend project theses talend parameters were stored on external database such as AWS RDS. Parameters could be changed without having talend studio application or re-deploy talend jobs again.
There are 5 RDS Databases available on AWS cloud provider : DEV / UAT / PRE-PROD and PROD, the 5th environments is the SandBox dedicated to talend developer for running locally their jobs from their computer / VDI (Virtual Desktop Infrastructure) / machine.
Gitlab
Talend development files are versionned on solvay gitlab repository. The access to gitlab through Talend is made with a SSH Key specific to each developer / data engineer.
TMC
The administration of the platform is made through the TMC hosted by talend editor (AWS).
Network
The connection between AWS / GCP and Solvay onPrem is made with Direct connect (Solvay to AWS) and Inter connect (Solvay to GCP), globally GCP / AWS and Solvay network is considered as one.
