SYSM-363 - Getting issue details... STATUS
SYSM-358 - Getting issue details... STATUS
This page defines the capacity-level configuration options that must be evaluated for our Microsoft Fabric platform, with a focus on:
- production stability
- workload isolation
- controlled scaling
- governance of shared resources
- independence of the Data Platform Core workspace
Our target operating model uses Fabric primarily as a data storage and exposure platform based on Lakehouse and Warehouse, serving both BI consumption and external data exposure.
INFO Fabric capacity configuration is a platform governance topic, not only an infrastructure topic.
RECOMMENDATION The primary control for protecting Data Platform Core is capacity isolation.
WARNING Shared capacity between Data Platform Core and Domain production creates shared operational risk.
DECISION A dedicated capacity for Data Platform Core is the recommended baseline for this architecture.
Version | Date | Description | Contributor |
V0.1 |
| Initial document | COLOMBANI Théo |
V0.2 |
| Updated with schema proposal and checklist | COLOMBANI Théo |
Objective
This page defines the main capacity design decisions for our Microsoft Fabric platform.
The objective is to ensure:
- production stability
- workload isolation
- controlled scalability
- governance of shared compute resources
- operational independence of the Data Platform Core workspace
In our architecture, Microsoft Fabric is primarily used as a storage and exposure platform, based on Lakehouse and Warehouse, for both BI consumption and external data exposure.
Platform Context
Data Platform Core workspace
- bronze layer
- silver layer
- core production data preparation and controlled exposure foundation
Domain workspaces
- gold layer
- business-oriented and BI-ready data products
- domain-level exposure for reporting and consumption
Key requirement
The Data Platform Core workspace must remain operational independently from Domain workspaces, including when Domain workloads generate higher or less predictable compute consumption.
Key messages
- Capacity design must be driven by isolation first, then by optimization.
- Capacity is an architecture boundary, not only a billing or administration object.
- The Data Platform Core workspace should not share the same production risk envelope as Domain workloads.
- Non-production must be isolated from production capacities.
- Capacity governance must cover both technical setup and operating model.
What we should implement
Recommended target model
- one dedicated Fabric capacity for Data Platform Core
- one separate Fabric capacity for Domain production
- one separate non-production capacity
- centralized control of workspace assignment
- formal governance for capacity admin and reassignment rights
- standardized monitoring and review cadence
- explicit disaster recovery assessment for Data Platform Core
Capacity design principles
- isolate critical and variable workloads
- avoid shared production capacity between Core and Domain if operational independence is required
- keep non-production outside production capacities
- define ownership and review rules for every production capacity
Proposed capacity design
Decision guide
| Decision area | Use this approach when | Recommended decision |
|---|---|---|
| Dedicated capacity for Data Platform Core | bronze and silver are production-critical; downstream BI or external exposure depends on them; Domain workloads are more variable than Core workloads; Core continuity is a priority | assign Data Platform Core to a dedicated capacity |
| Separate Domain production capacity | multiple Domain workspaces coexist; Domain workloads may create contention; business-facing usage is less predictable; Domain growth should not affect Core operations | assign Domain production to a separate capacity |
| Separate non-production capacity | development and testing are active; experimentation may generate compute spikes; production stability must be protected from non-production activity | keep non-production on a separate capacity |
| Architecture review required | Core and Domain are still planned on the same capacity; workspace reassignment is not tightly governed; production and non-production still share capacity; capacity sizing issues become recurrent | escalate to architecture review |
Checklist
- Has a dedicated capacity been confirmed for Data Platform Core?
- Has Domain production been isolated from Data Platform Core?
- Has non-production been separated from production capacities?
- Have capacity admin roles been limited to the central platform team?
- Have workspace reassignment rights been formally governed?
- Has a monitoring owner been assigned for each production capacity?
- Has disaster recovery been explicitly assessed for Data Platform Core?
- Have Spark-related settings been reviewed, if applicable?
- Has the target capacity model been approved as part of platform governance?
Recommended configuration matrix
| Setting | Data Platform Core | Domain Production | Non-Production | Recommendation |
|---|---|---|---|---|
| Dedicated capacity | Yes | Preferred | Separate | Mandatory for Data Platform Core |
| Shared with Data Platform Core | No | No | No | Not allowed |
| Workspace reassignment rights | Very restricted | Restricted | Controlled | Govern centrally |
| Monitoring | Mandatory | Mandatory | Recommended | Standard operating baseline |
| DR assessment | Mandatory | Case by case | Not priority | Explicit decision required |
| Spark governance | Case by case | Case by case | Flexible | Only where relevant |
| Scaling review cadence | Regular | Regular | Periodic | Metrics-driven |
Detailed design sections
Design principle
Recommendation
Capacity design must be driven by isolation first, then by optimization.
Why it matters
In our context, the main purpose of capacity governance is not only to size compute correctly. It is primarily to:
- protect critical Data Platform Core workloads
- separate critical and non-critical workloads
- reduce cross-workspace contention
- create predictable operating conditions
- support controlled platform growth
Decision statement
For our platform, capacity is an architecture boundary, not only a billing or administration object.
Recommended target model
Capacity A — Data Platform Core
Used only for:
- Data Platform Core bronze
- Data Platform Core silver
- core production ingestion, preparation, and exposure foundations
Capacity B — Domain Production
Used for:
- Domain gold workspaces
- business-facing data products
- BI-oriented workloads
- potentially more variable usage patterns
Capacity C — Non-Production
Used for:
- development
- testing
- experimentation
- validation before production promotion
Recommendation
Do not place Data Platform Core production and Domain production on the same capacity if Data Platform Core must remain operational independently.
Why this matters
A shared capacity creates a shared risk envelope. Even if workspaces are logically separated, they still depend on the same underlying capacity behavior.
Workspace-to-capacity assignment
What it is
The assignment of workspaces to specific Fabric capacities.
Why it matters
This is the most important configuration decision in our model because it determines whether workloads share the same compute risk domain.
Recommendation
- assign Data Platform Core to a dedicated capacity
- assign Domain production to a separate capacity whenever possible
- isolate non-production from all production capacities
- avoid mixing critical platform workloads with variable business workloads
Key message
Workspace assignment is the primary mechanism used to guarantee production isolation and operational independence.
Capacity administration and reassignment governance
What it is
The set of permissions allowing administrators to manage a capacity and move workspaces into or out of it.
Why it matters
Even with a good target architecture, weak governance can reintroduce risk if workspaces are moved without control.
Recommendation
- restrict capacity admin rights to the central Data Platform or IT team
- restrict workspace reassignment rights on critical capacities
- require formal approval for any workspace added to the Data Platform Core capacity
- prevent self-service reassignment into critical production capacity
Key message
A dedicated production capacity loses most of its value if workspace assignment is not tightly governed.
Capacity sizing and scaling
What it is
The sizing of Fabric capacity and the ability to adjust it as workload volume evolves.
Why it matters
Even a well-isolated architecture can fail operationally if the capacity is persistently undersized.
Recommendation
- size Data Platform Core with stability and operational headroom in mind
- review Domain production more frequently, as usage can be less predictable
- use monitoring trends to drive scaling decisions
- avoid reactive resizing without understanding the underlying workload pattern
Practical interpretation
- Data Platform Core should be sized for continuity first
- Domain capacities can be managed more elastically
Key message
Data Platform Core capacity sizing must prioritize service continuity over cost minimization.
Monitoring and operational visibility
What it is
The monitoring of capacity usage, saturation patterns, top consumers, and operational degradation signals.
Why it matters
Capacity governance is only effective if usage and saturation can be observed and acted upon.
Recommendation
For each production capacity, define:
- monitoring owner
- review cadence
- alert thresholds
- escalation path
- expected remediation actions
Minimum baseline
- monitor recurring peaks
- identify top consuming workspaces and items
- review saturation or degradation patterns
- correlate operational issues with refresh, ingestion, or usage spikes
Key message
Capacity monitoring must be part of normal run operations, not only incident management.
Disaster recovery
What it is
The capacity-level disaster recovery posture associated with production data continuity.
Why it matters
The Data Platform Core workspace supports bronze and silver foundations, which makes it a core dependency for downstream exposure.
Recommendation
- perform an explicit DR assessment for Data Platform Core
- document whether DR is enabled or not
- document expected recovery assumptions and limitations
- ensure this is an explicit architecture decision
Key message
For Data Platform Core, disaster recovery should never be left undocumented.
Data Engineering and Spark-related settings
What it is
Capacity-level settings related to Spark and Data Engineering workloads.
Why it matters
These settings are relevant if Spark-based processing is materially used in the platform.
Recommendation
- keep Spark governance centralized
- avoid uncontrolled compute sprawl
- document Spark rules separately if Spark is not a central workload in the platform
Key message
This is a secondary topic unless Spark becomes a major production dependency.
Operational rules
Rule 1
Protect Data Platform Core by design.
Critical Core workloads must not depend on the same shared capacity behavior as variable Domain workloads.
Rule 2
Use isolation before optimization.
Do not try to solve structural contention only with reactive tuning.
Rule 3
Make monitoring part of standard operations.
Capacity review must be proactive and periodic.
Rule 4
Separate production from experimentation.
Development and testing workloads must not compete with critical production capacity.
Proposed architecture decision
Recommended decision
The recommended target state for our platform is:
- one dedicated Fabric capacity for Data Platform Core
- one separate Fabric capacity for Domain production
- one separate non-production capacity
- centralized control of workspace assignment
- standardized monitoring and review
- explicit DR assessment for Data Platform Core
Architecture conclusion
This is the most coherent model for a Fabric platform used primarily as a storage and exposure layer, where the Data Platform Core workspace must remain stable independently from Domain activity.
