| Status | WIP |
|---|---|
| Stakeholders | Antoine Roy |
| Outcome | |
| Due Date |
|
| Owner | Ira Banerjee, Emma Glasson |
| Solution/Domain/Data Architect |
The new CSRD Regulation significantly impacts the quantity and the auditability of the environmental KPI reporting.
Estimated 140 additional man days/year if continuing with the same way of working. Plus increased risk of non compliance.
Better data transparency and quality on Water, Waste & Air Emissions for each Syensqo site. Increase efficiency, auditability of CSRD data collected from sites.
Fewer FTEs spent on mandatory reporting each year.
For GBU SPP, Sustainability Corporate Function
Who Industrial Function, GBUs, DT Digital Operations/Operational Performance
This solution is a Power BI dashboard and a Power Apps application
That will provide automatized KPIs of the SERF report on a daily basis
Unlike existing SERF report manually fed by each site on a yearly basis
Our solution structured data visualization and input, enhancing Data automation & data transparency.
Indicators to accomplish: Water, Waste & Air Emissions KPIs, Syensqo One Planet KPIs.
High Level Design (Phase I)
| Logical layered view | Application boundaries view |
|---|---|
Low Level Design
<<Contribution from Azure Architect: Scott O´Neill scott.oneill-ext@syensqo.com>>
- Add IP addresses/ports
- range IP origin from Azure
- range IP destination for StarTek
- range IP destination for GCP/Labware
- Public/Private endpoints
- Network Gateway - Backbone
- Security Group
- Service Account
- VPC´s
- Local Networks
- Workspace´s name
Components - Responsibility
- Dataverse: A storage responsible to be a virtual layer between Lakehouse data sources (labware & startek), also to persist user content (comments, value change proposals, pdf data);
- MS Fabric: Data engine responsible to fetch data from data sources (GCP:Labware & AWS:Startek), combine that with user content in order to generate the KPI´s;
- Lakehouse: Centralized data storage for all structured, semi-structured, and unstructured data; leveraging Medallion architecture for data organization; so data engines can read/write data at scale;
- Power Apps: Custom application framework to create workflow automation, asynchronous processing, ease integration with Azure Tech stack and been the user interface for triggering such processing;
- Power BI: Tool responsible to enable user to evaluate the data for analytics purpose;
- AI Builder: SaaS Azure solution which enables specific purpose AI agents for evaluating data input to generate expected output. In this case, a pre-trained model is used for ingesting user PDF and extract specific data points for been persisted in Dataverse; The PDF´s to be uploaded can be seen here, and its raw size is up to 1MB, where the relevant content for extraction are the tables.
Architectural Significant Requirements <<Q&A, assessment>>
Sites impacted
- Tavaux
Data Model
- Labware: sample results, product, specifications
- Star Tek: Flowmeters and Quantity totalizer
- System trail:
- User action logs: timestamp, user id, action
- System automation action logs: timestamp, user id, action
- User Content
- Uploaded PDF files:
- Comments
- Metric values changes proposal
Design Decisions
- Sensitivity Points
- The volume of data (Labware, Startek)
- Risks
- Non Risks
- Architectural Approaches
- Multi-model storage
- Medallion architecture
- Shared ingestion mechanisms: job_raw_labware | topic_raw_labware; job_raw_startek | topic_raw_startek
- Shared storage artifacts: bronze.raw_labware; bronze.raw_startek
- Trade-offs (alternatives)
- Questions/Concerns
- Does it make sense to store user content in Dataverse? Does it make sense to replicate user content from Dataverse to Lakehouse? Is there any strategical benefit (at the application level roadmap level, sustainability domain level, enterprise architecture level) for leveraging Dataverse for such purpose?
- Keep more than one data repository for the same business context introduce architectural and data governance complexity.
- How can user keep track of background processing affecting his/her own data work context?
- What the usage of the user content data (comments, change proposal, uploaded files)? Is there any usage outside the CSRD application context? (Some data points trigger the KPI pipeline; others, do not).
- Does it make sense to store user content in Dataverse? Does it make sense to replicate user content from Dataverse to Lakehouse? Is there any strategical benefit (at the application level roadmap level, sustainability domain level, enterprise architecture level) for leveraging Dataverse for such purpose?
- Alternatives
- Message Driven for user content: Could Power Apps read original data from Lakehouse for user evaluation, collects user change proposal associated with original record ID and sends it as a message to EventHub, then a consumer would sink that to Lakehouse next to original records?
- Pros:
- Cons:
- Multi-model storage - Separation of Concern/Responsibility Segregation: For the need of user take actions that DOES NOT trigger the KPI pipeline processing (nor affect KPI processing) like adding comments and/or annotations, manage workflow approval process, log trail activities, or apply any ACL workspace control amongst user profiles (personas should be defined), a relational database (CosmosDB, Azure SQL?) at Golden layer been the last stage for KPI results visualization (via PowerBI) in combination with those user actions could better address.
- Pros:
- Cons:
- Message Driven for user content: Could Power Apps read original data from Lakehouse for user evaluation, collects user change proposal associated with original record ID and sends it as a message to EventHub, then a consumer would sink that to Lakehouse next to original records?
- Questions/Concerns
Target Architecture (Phase II)
https://lucid.app/lucidchart/b7c02cfa-265a-4c9c-97ef-beb40bd7ef84/edit?page=0_0#
Core Building Blocks - Responsibility
- Event-Hub: a native data-streaming service responsible to get data from producer as self-contained message or as trigger for data collector perform the data logistic and make that available for been sink into Data Platform for data consumers usage.
- Notification Hubs: component responsible to observe system and users sync/asynchronous event actions and make them readable for user, to bring awareness and improve traceability from the user perspective over the data changes;
- Schema Registry: It ensures data quality, consistency, and safe evolution in event-driven or streaming systems.
- API Gateway: Uniform interface, to isolate internal complexity, and supports DevSecOps practices by centralizing access control and observability. In the architecture plays core responsibility to make it transparent which technology leverages specific responsibility from the user perspective, so to ease plug-and-play approach for components - for instance the "AI Doc Extractor".
- AI Doc Extractor: Component responsible to extract specified data points from a document uploaded by user. AI capability allows fine tuning the model to capture data points with better accuracy.
- SQL Endpoint: Interface for fetching data leveraging SQL query engine.
- Power BI: Analytic tool to allow user evaluate structured and cleaned data for visualization analysis and report download.
- Power App: Application context for custom capabilities been deployed leveraging low-code capabilities for end user interaction and also for mid level workload scheduled processing tasks.
References
Dataverse integration with Microsoft Fabric https://www.youtube.com/watch?v=bgcNsqp92YE
PowerApps/Dataverse https://learn.microsoft.com/en-us/power-apps/maker/data-platform/data-platform-intro
AI Builder https://learn.microsoft.com/en-us/ai-builder/overview
Tech Spec diagrams https://app.diagrams.net/#G1xhYLK5o4cYyKx6jod3TGL_WFdx9bu2Zi#%7B%22pageId%22%3A%22RwbzW0ZmPPmPHFFOWm3V%22%7D
SIP Process https://docs.google.com/spreadsheets/d/1j1i-8BkX8YWf2xtX-JaDT4R827aBO59L/edit?gid=1422442667#gid=1422442667
Target Architecture Presentation https://docs.google.com/presentation/d/1KosHyUYtOmcDwbKHYxspv6O_RdMUjdJ5PMsY33yeD_I/edit#slide=id.g31035d80c67_0_0
Dataverse and Azure SQL https://community.dynamics.com/blogs/post/?postid=22ce1a68-bb3f-4139-ae19-2a1d660286b1
One-Pager | IT12892 -CSRD Reporting Automation | One-Pagers https://docs.google.com/presentation/d/1YSiN71DA8U8ekkRJeV_HAyrV3DXU7G_3wXNoYy2Vt7E/edit#slide=id.g307aa5ad372_0_638
Medallion architecture https://learn.microsoft.com/en-us/azure/databricks/lakehouse/medallion
Power Apps integration with Event Hub https://www.carlosag.net/PowerApps/Connectors/Event-Hubs

