First question to answer is the type of the process:
The second question to answer is the scope of the code/repo/function/container.
On the one hand, we could create one repository and it covers the 6000 SAP tables we ingest. On the other hand we could have one repo per SAP table, meaning we would have 6000 repos just for that.
Obviously neither extremes are desirable. As the repository is the deployment unit, it should contain enough code to deserve a repo of its own, yet whenever something is changed, the entire repo is deployed. Hence it should not be massive either.
The hierarchy of objects is:
For containers it is similar, except for the wording
We have separated the work between the Infrastructure team and the developer in such a way that the Infrastructure handles the network and services tightly integrated with the network, but the developer has enough freedom to make changes to the individual services using biceps code.
Concrete that means, Infrastructure owns
Hence asking the Infra team to create a new Function App will be the most common request.
The developer owns
Step 1: Create a repository at https://github.com/SQO-SySight with the name being either
Ingest-<SourceSystem>: For code that reads data from a source system, e.g. Ingest-StarTek Project-<Name>: For code that transforms data for a specific project, e.g. Project-CSRD Transform-<Name>: For code that transforms source data for general consumption, e.g. Transform-SAP-Master-Data-1 The default branch is master.
Step 2: Copy from one of the template repos the basic objects and customize. The things that need to be adjusted are described in the README of each.
Step 3: Customize
Apart from writing the code, the infrastructure related biceps files should be reviewed to match the actual need. For example, not every Function needs the same amount of memory, managed identities, permissions,..
The code should be developed and debugged on the local laptop, from within the Syensqo network. The Azure VNet and the Syensqo network are paired and in the development environment of Azure, the developer has rather wide permissions. All the ones the function/container has plus more.
The alternative is to have a Window VM inside the Azure network, but mind the additional costs of that.
Debugging locally is way more efficient than deploying the code and watching it from the outside. It also allows to check the performance, enables profiling and avoids that runaway jobs bring down the whole environment by consuming the complete power.
If something went wrong in production, the logs are the first place to understand what happened. The person looking at to logs first will be an IT support person, who has no understanding of the code. If the logs provide enough information for this person to fix the issue by himself, the system will be available more quickly and we, the developers, won't be bothered. Hence investing some time into logging is certainly a time saver for us.
logger = lib_producer.utils.get_logger(<name>) is the default way of getting a Python logger instance. This logger will be set to the correct log level, considering the environment and the settings and also set the log format.Because the same code is executed in all environments, it cannot contain any environment specific information. Certainly no connection credentials. All credentials are stored in the environment's Azure Key Vault.
The developed code should include test automation. Ideally would be a mock source and a mock target, so we can test without the need of any external resources. There are solutions for that but these are all rather time intensive. Hence the compromise is that we read the data from a development source system and write into a DummyProducer. This does not guarantee that the source data covers all cases, especially delta handling will be a challenge, but at least it allows to compare what has been read with what has been produced.
If for some sources a mock source can be created in a reasonable amount of time, this is favored.
The approach for switching between mock- and real producers happens via the software pattern of adapters.
But integration tests should be implemented also, because this will be a source of rather more oversights. Missing or wrong credentials, incorrect roles and permissions, naming errors when reading the key vault etc.