Schema showing the different STEPS of the application flow - with the data involved at each step
The scopes included this project are the extractions from SAP BW, SAP QM, and Star Tek.
The logic of the data exposure to the web application is done through Views within Google BigQuery.
Collect the Industrial data into a centralized data repository / Single Source of Truth that enhances data quality to expose to Web Application for End Users
Talend is used for the orchestration of all extractions and loading of the data into Google BigQuery.
Talend Jobs require valid GCP service account credentials to load data into Google Cloud Storage and Google Big Query.
The talend jobs must have valid credentials for accessing SAP and Star Tek. Refer to the technical documentation.
How to validate that this output match business expectations
which kind of information or configuration this step uses ?
There are two principle data sources for Electronic Batch Data
SAP QM ( Tables related to Batch & Quality)
Star Tek ( Technical data related to batches)
SAP BW ( HR Data related to users and roles at GBUS )
Data Ocean Domains
- Structure : Materials and Material Types
- Procurement : Commercial Products
The format of the source datasets
Extracted data is physically stored in Google BigQuery in the EU multi region for all projects.
For new data being extracted from SAP QM, it is stored in the GCP projects for the industrial domains (prj-data-dm-industrial-(dev/test/ppd/prod) and exposed to the GCP ebatch projects through Views.
For new data being extracted from Star Tek for GBU sites, it is stored in the GCP projects dedicated for ebatch (prj-data-sad-ebatch-(dev/test/ppd/prod)
While the extractions of tables from SAP QM is to tables within BigQuery, the data exposure is managed through Views in the project to be consumed by the Ebatch Web Application.
Expected data volume
Is there an automatic schedule ?
Extractions are programmed through Talend
At what frequency ? What is the trigger ?
Hourly extractions for all tables.
The average time expected for :
- Approximately 10 minutes for all jobs
High / Medium / Low
Logs are available through Talend, and through the respective GCP projects used in the project.