This page focuses on three Fabric capacity options that can improve operational resilience:
For our platform, these options should be treated as operational controls, not as substitutes for proper capacity isolation and sizing. Since throttling is applied at the capacity level, the first protection for the Data Platform Core workspace (prod Capacity only here) remains capacity separation from Domain workloads |
Version | Date | Description | Contributor |
V0.1 |
| Initial document | COLOMBANI Théo |
Surge Protection helps limit overuse of a capacity by controlling background compute consumption. At capacity level, when it becomes active, background jobs are rejected. Microsoft also recommends using the Capacity Metrics app to tune thresholds and explicitly states that critical solutions should be isolated on a dedicated capacity for full protection.
Surge Protection is useful when interactive or user-facing workloads share capacity with background operations such as refreshes, AI jobs, or other heavy compute activity. Microsoft’s planning guidance recommends it in exactly that situation.
Surge Protection does not guarantee that interactive requests will never be delayed or rejected. It does not stop jobs already in progress. Some Fabric UI actions are treated as background operations and can also be rejected. Certain OneLake activities remain unaffected.
Workspace-level Surge Protection adds a second layer: it can enforce per-workspace CU limits, automatically detect and block noisy workspaces, and mark a workspace as Mission Critical or Blocked. A Mission Critical workspace ignores workspace-level blocking rules, while a Blocked workspace rejects all requests during the block period.
For Data Platform Core capacity
For Domain production capacity
Surge Protection should be enabled primarily on shared Domain capacities to reduce the impact of bursty background workloads. It must not replace dedicated capacity for the Data Platform Core workspace. Thresholds must be based on observed capacity metrics and reviewed periodically.
Fabric provides two practical approaches for capacity alerting:
For a governed platform, notifications should not be treated as optional. They are the minimum control that turns capacity health into an operational process rather than a reactive troubleshooting exercise. Capacity Overview Events are specifically intended to monitor capacity health and create automated alerts.
Minimum baseline for every production capacity
For Data Platform Core capacity
For Domain production capacity
Option 1 — Simple baseline
Use capacity email notifications for a first layer of monitoring. These are configured by a capacity admin in capacity settings.
Option 2 — Recommended target
Use Fabric Capacity Overview Events in Real-Time Hub and create threshold-based alerts. Microsoft’s guidance for alert setup uses the event type Microsoft.Fabric.Capacity.Summary, selects monitoring by capacity, and recommends using numeric threshold conditions rather than connection-time filters.
Every production Fabric capacity must have an assigned operational owner, active alerting, and a documented escalation path. Real-Time Hub alerts should be the preferred target implementation for capacity health monitoring.
Capacity Overage allows Fabric to use extra compute beyond the purchased capacity limit to prevent throttling. It is available only for F SKUs, requires capacity admin permissions, and requires sufficient quota or Fabric capacity units to support the configured overage limit. Microsoft says it is turned off by default for existing capacities.
Microsoft positions Capacity Overage as a way to handle rare unexpected spikes or small regular spikes where scaling up is not otherwise required. It helps prevent throttling and allows new jobs to run, reducing downstream user impact.
Capacity Overage does not improve performance. It mainly prevents throttling. It can also admit new large jobs, so it does not remove the need for governance. Microsoft also warns to use caution when scaling down a capacity with overage enabled, because automatic charges can become significant.
For Data Platform Core capacity
For Domain production capacity
Enable Capacity Overage when:
Do not enable it as the default response to recurring saturation. In that case, the right answer is usually resize, optimize, or isolate workloads.
Capacity Overage may be enabled on critical Fabric capacities as a controlled resilience mechanism. It must remain capped, monitored, and financially approved. It must not replace correct sizing or workload isolation.
| Feature | Data Platform Core capacity | Domain production capacity | What to implement |
|---|---|---|---|
| Surge Protection | Secondary control only | Yes | Enable mainly on Domain capacity, tune with Metrics App, use workspace-level controls |
| Notifications / Alerts | Mandatory | Mandatory | Define owner, recipients, thresholds, escalation path |
| Capacity Overage | Optional, capped | Optional, tightly capped | Use only for rare peaks and with budget approval |
This operating model aligns with Microsoft’s guidance: isolate critical workloads first, use Surge Protection to protect shared interactive workloads, use alerts for active monitoring, and use Capacity Overage as a safety net rather than a normal operating mode.