You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

SYSM-363 - Getting issue details... STATUS

SYSM-358 - Getting issue details... STATUS

This page defines the capacity-level configuration options that must be evaluated for our Microsoft Fabric platform, with a focus on:

  • production stability
  • workload isolation
  • controlled scaling
  • governance of shared resources
  • independence of the Data Platform Core  workspace

Our target operating model uses Fabric primarily as a data storage and exposure platform based on Lakehouse and Warehouse, serving both BI consumption and external data exposure.

INFO Fabric capacity configuration is a platform governance topic, not only an infrastructure topic.

RECOMMENDATION The primary control for protecting Data Platform Core  is capacity isolation.

WARNING Shared capacity between Data Platform Core  and Domain production creates shared operational risk.

DECISION A dedicated capacity for Data Platform Core  is the recommended baseline for this architecture.


Version

Date

Description

Contributor

V0.1

 

Initial document

COLOMBANI Théo








Fabric Capacity Configuration for Our Data Platform

1. Objective

This page defines the capacity-level configuration options that must be assessed for our Microsoft Fabric platform.

The objective is to ensure:

  • production stability

  • workload isolation

  • controlled scalability

  • governance of shared compute resources

  • operational independence of the Data Platform Core  workspace

In our architecture, Microsoft Fabric is primarily used as a storage and exposure platform, based on Lakehouse and Warehouse, for both BI consumption and external data exposure.


2. Platform Context

Our target operating model is structured as follows:

Data Platform Core workspace workspace

  • bronze layer

  • silver layer

  • core production data preparation and controlled exposure foundation

Domain workspaces

  • gold layer

  • business-oriented and BI-ready data products

  • domain-level exposure for reporting and consumption

Key requirement

The Data Platform Core production workspace must remain operational independently from Domain workspaces, including in situations where Domain workloads generate higher or less predictable compute consumption.


3. Design Principle

Recommendation

Capacity design must be driven by isolation first, then by optimization.

Rationale

In our context, the main purpose of capacity governance is not only to size compute correctly. It is primarily to:

  • protect critical Data Platform Core  workloads

  • separate critical and non-critical workloads

  • reduce cross-workspace contention

  • create predictable operating conditions

  • support controlled platform growth

Decision statement

For our platform, capacity is an architecture boundary, not only a billing or administration object.


4. Recommended Target Model

Target architecture

Capacity A — Data Platform Core 

Used only for:

  • Data Platform Core bronze

  • Data Platform Core silver

  • core production ingestion / preparation / exposure foundations

Capacity B — Domain Production

Used for:

  • Domain gold workspaces

  • business-facing data products

  • BI-oriented workloads

  • potentially more variable usage patterns

Capacity C — Non-Production

Used for:

  • development

  • testing

  • experimentation

  • validation before production promotion


Recommendation

Do not place Data Platform Core production and Domain production on the same capacity if Data Platform Core production must remain operational independently.

Why this matters

A shared capacity creates a shared risk envelope. Even if workspaces are logically separated, they still depend on the same underlying capacity behavior.


5. Capacity-Level Settings to Document

5.1 Workspace-to-capacity assignment

What it is

The assignment of workspaces to specific Fabric capacities.

Why it matters

This is the most important configuration decision in our model because it determines whether workloads share the same compute risk domain.

Recommendation

  • assign Data Platform Core production to a dedicated capacity

  • assign Domain production to a separate capacity whenever possible

  • isolate non-production from all production capacities

  • avoid mixing critical platform workloads with variable business workloads

Confluence panel text

Recommendation
Workspace assignment is the primary mechanism used to guarantee production isolation and operational independence.


5.2 Capacity administration and reassignment governance

What it is

The set of permissions allowing administrators to manage a capacity and move workspaces into or out of it.

Why it matters

Even with a good target architecture, weak governance can reintroduce risk if workspaces are moved without control.

Recommendation

  • restrict capacity admin rights to the central Data Platform or IT team

  • restrict workspace reassignment rights on critical capacities

  • require formal approval for any workspace added to the Data Platform Core uction capacity

  • prevent self-service reassignment into critical production capacity

Confluence panel text

Warning
A dedicated production capacity loses most of its value if workspace assignment is not tightly governed.


5.3 Surge protection

What it is

A protection mechanism used to manage overload situations and reduce the impact of excessive background activity on a capacity.

Why it matters

It can help protect shared capacities, especially where Domain workspaces may generate bursty or uneven usage patterns.

Recommendation

  • consider enabling surge protection on shared Domain production capacities

  • use it as a protection layer for variable workloads

  • do not rely on it as the sole protection for Data Platform Core 

Position

Surge protection is a supporting control, not a substitute for proper isolation.

Confluence panel text

Recommendation
Use surge protection on shared capacities.
Do not use it as a replacement for dedicated capacity when a workspace is mission-critical.


5.4 Capacity sizing and scaling

What it is

The sizing of Fabric capacity and the ability to adjust it as workload volume evolves.

Why it matters

Even a well-isolated architecture can fail operationally if the capacity is persistently undersized.

Recommendation

  • size Data Platform Core  with stability and operational headroom in mind

  • review Domain production more frequently, as usage can be less predictable

  • use monitoring trends to drive scaling decisions

  • avoid reactive resizing without understanding the underlying workload pattern

Practical interpretation

  • Data Platform Core  should be sized for continuity first

  • Domain capacities can be managed more elastically

Confluence panel text

Decision
Data Platform Core  capacity sizing must prioritize service continuity over cost minimization.


5.5 Capacity overage

What it is

A mechanism that allows excess usage beyond the purchased capacity threshold, subject to billing and governance.

Why it matters

It can reduce the risk of operational disruption during rare peaks.

Recommendation

  • consider enabling overage for Data Platform Core  only with explicit financial approval

  • define a capped and governed usage threshold

  • treat overage as a resilience mechanism, not a normal operating model

Position

Overage is a safety net, not a sizing strategy.

Confluence panel text

Warning
Do not use overage to compensate for structural under-sizing.


5.6 Monitoring and operational visibility

What it is

The monitoring of capacity usage, saturation patterns, top consumers, and operational degradation signals.

Why it matters

Capacity governance is only effective if usage and saturation can be observed and acted upon.

Recommendation

For each production capacity, define:

  • monitoring owner

  • review cadence

  • alert thresholds

  • escalation path

  • expected remediation actions

Minimum baseline

  • monitor recurring peaks

  • identify top consuming workspaces and items

  • review saturation or degradation patterns

  • correlate operational issues with refresh, ingestion, or usage spikes

Confluence panel text

Recommendation
Capacity monitoring must be part of normal run operations, not only incident management.


5.7 Disaster recovery

What it is

The capacity-level disaster recovery posture associated with production data continuity.

Why it matters

The Data Platform Core  workspace supports bronze and silver foundations, which makes it a core dependency for downstream exposure.

Recommendation

  • perform an explicit DR assessment for Data Platform Core 

  • document whether DR is enabled or not

  • document expected recovery assumptions and limitations

  • ensure this is an explicit architecture decision

Position

For Data Platform Core , DR should never be left undocumented.

Confluence panel text

Decision
Disaster recovery for Data Platform Core  must be assessed explicitly and recorded as an approved architecture choice.


5.8 Notifications and alerting

What it is

The definition of who is informed when capacity issues occur and how operational response is triggered.

Why it matters

Without alert ownership, capacity incidents tend to be handled too late or inconsistently.

Recommendation

Define:

  • alert recipients

  • severity levels

  • response expectations

  • operational communication path

Confluence panel text

Recommendation
Every production capacity must have a clearly assigned operational owner and alerting path.


5.9 Data Engineering and Spark-related settings

What it is

Capacity-level settings related to Spark and Data Engineering workloads.

Why it matters

These settings are relevant if Spark-based processing is materially used in the IT workspace.

Recommendation

  • keep Spark governance centralized

  • avoid uncontrolled compute sprawl

  • document Spark rules separately if Spark is not a central workload in the platform

Position

This is a secondary topic in our model unless Spark becomes a major production dependency.


6. Recommended Configuration Matrix

SettingData Platform Core Domain ProductionNon-Production
Dedicated capacityYesPreferredSeparate
Shared with Data Platform Core NoNoNo
Workspace reassignment rightsVery restrictedRestrictedControlled
Surge protectionOptional complementRecommendedOptional
Capacity overageOptional, cappedOptional, cappedUsually not required
MonitoringMandatoryMandatoryRecommended
DR assessmentMandatoryCase by caseNot priority
Spark governanceCase by caseCase by caseFlexible
Scaling review cadenceRegularRegularPeriodic

7. Operational Rules

Rule 1

Protect Data Platform Core  by design.
Critical IT workloads must not depend on the same shared capacity behavior as variable domain workloads.

Rule 2

Use isolation before optimization.
Do not try to solve structural contention only with reactive tuning or protection features.

Rule 3

Treat overage as an exception mechanism.
It may improve resilience, but it must not become the default operating mode.

Rule 4

Make monitoring part of standard operations.
Capacity review must be proactive and periodic.

Rule 5

Separate production from experimentation.
Development and testing workloads must not compete with critical production capacity.


8. Proposed Architecture Decision

Recommended decision

The recommended target state for our platform is:

  • one dedicated Fabric capacity for Data Platform Core 

  • one separate Fabric capacity for Domain production

  • one separate non-production capacity

  • centralized control of workspace assignment

  • standardized monitoring and alerting

  • optional capped overage for resilience

  • explicit DR assessment for Data Platform Core 

Architecture conclusion

This is the most coherent model for a Fabric platform used primarily as a storage and exposure layer, where the Data Platform Core  workspace must remain stable independently from Domain activity.


9. Configuration Decisions to Validate

Checklist

  • Has a dedicated capacity been confirmed for Data Platform Core ?

  • Has Domain production been isolated from Data Platform Core ?

  • Has non-production been separated from production capacities?

  • Have capacity admin roles been limited to the central platform team?

  • Have workspace reassignment rights been formally governed?

  • Has surge protection been evaluated for shared Domain capacities?

  • Has capacity overage been evaluated and financially approved where relevant?

  • Has a monitoring owner been assigned for each production capacity?

  • Have alert thresholds and escalation paths been defined?

  • Has disaster recovery been explicitly assessed for Data Platform Core ?

  • Have Spark-related settings been reviewed, if applicable?

  • Has the target capacity model been approved as part of platform governance?



  • No labels