Jira

server	Syensqo's Jira
columnIds	issuekey,summary,issuetype,created,updated,duedate,assignee,reporter,customfield_13736,priority,status,resolution
columns	key,summary,type,created,updated,due,assignee,reporter,Priority,priority,status,resolution
serverId	d8efc1ef-48bd-3b4e-8714-ad827f4f059b
key	SYSM-358

Info

This page focuses on three Fabric capacity options that can improve operational resilience:

Surge Protection
Notifications / Alerts
Capacity Overage

For our platform, these options should be treated as operational controls, not as substitutes for proper capacity isolation and sizing. Since throttling is applied at the capacity level, the first protection for the Data Platform Core workspace (prod Capacity only here) remains capacity separation from Domain workloads

Version	Date	Description	Contributor
V0.1	15 Apr 2026	Initial document	COLOMBANI Théo
V0.1	17 Apr 2026	Update document	COLOMBANI Théo

Table of Contents

maxLevel	2

3. Surge Protection

What it is

Surge Protection helps limit overuse of a capacity by controlling background compute consumption. At capacity level, when it becomes active, background jobs are rejected. Microsoft also recommends using the Capacity Metrics app to tune thresholds and explicitly states that critical solutions should be isolated on a dedicated capacity for full protection.

What matters for us

...

Key message

These three options should be used as operational safeguards, not as substitutes for proper capacity design.

For our platform, the priority remains:

protect the Data Platform Core workspace through capacity isolation
use Surge Protection mainly to control variable Domain workloads
make Notifications mandatory for production operations
use Capacity Overage only as a controlled safety net for rare peaks. (learn.microsoft.com)

What we should implement

Feature	Data Platform Core capacity	Domain production capacity	Clear recommendation
Surge Protection	Secondary control only	Yes	Enable mainly on Domain capacity
Notifications / Alerts	Mandatory	Mandatory	Define owner, recipients, thresholds, escalation
Capacity Overage	Optional, capped	Optional, tightly capped	Use only for rare peaks and with budget approval

Recommended setup

Image Added

For Data Platform Core

keep it on a dedicated capacity
make alerts mandatory
use Capacity Overage only if uptime is critical and cost is approved
do not rely on Surge Protection as the main protection layer. (learn.microsoft.com)

For Domain production

allow shared capacity if needed
enable Surge Protection
enable workspace-level controls
make alerts mandatory
use Capacity Overage only with strict limits. (learn.microsoft.com)

Decision guide

Enable Surge Protection when

Make Notifications mandatory when

Enable Capacity Overage when

the capacity is shared
Domain workloads are variable
noisy workspaces need runtime control. (learn.microsoft.com)

the capacity is production
the platform team is expected to operate it properly

That means: always for production capacities. (learn.microsoft.com)

the capacity is critical
peaks are occasional, not structural
the financial model is accepted
monitoring is already in place. (learn.microsoft.com)

Quick checklist

Is Surge Protection enabled on Domain production capacity?
Is workspace-level Surge Protection enabled for Domain workspaces?
Are Mission Critical workspaces explicitly limited and documented?
Does each production capacity have an owner?
Are alerts configured for each production capacity?
Is Real-Time Hub alerting planned or implemented?
Is Capacity Overage enabled only where justified?
Is every overage limit capped and approved?
Do repeated alerts or overage events trigger a review of sizing or isolation?

1. Surge Protection

What to understand

Surge Protection helps reduce the impact of heavy background activity on a capacity. It is especially useful when interactive or user-facing workloads share capacity with

...

more variable background jobs. Microsoft recommends it for shared capacities, but also states that critical solutions should still be isolated on a dedicated capacity. (learn.microsoft.com)

What it is good for	What it is not
protecting shared capacities from bursty workloads reducing the impact of noisy workspaces limiting background pressure on user-facing workloads. (learn.microsoft.com)	not a replacement for dedicated capacity not a guarantee that all interactive requests will always succeed not a fix for structural under-sizing. (learn.microsoft.com)

Important limitations

Surge Protection does not guarantee that interactive requests will never be delayed or rejected. It does not stop jobs already in progress. Some Fabric UI actions are treated as background operations and can also be rejected. Certain OneLake activities remain unaffected.

Workspace-level control

...

What we should put in place

For Data Platform Core capacity

do not rely on Surge Protection as the main protection
protect this workspace first through dedicated capacity
optionally keep Surge Protection available as a secondary control, but only after observing real usage patterns in Metrics App.

...

enable it mainly on Domain production capacity

...

add workspace-level Surge Protection
define a rule

...

for handling repeated noisy workspaces
keep Mission Critical status limited to

...

a very small number of justified

...

Suggested policy text

...

4. Notifications and Alerts

What it is

Fabric provides two practical approaches for capacity alerting:

...

cases
tune thresholds using the Capacity Metrics App, not guesswork. (learn.microsoft.com)

Info

title	Key message

Use Surge Protection to control shared Domain workloads, not to protect the Data Platform Core workspace

...

instead of isolating it.

Example

Image Added

2. Notifications / Alerts

What to understand

Notifications are the minimum control that turns capacity health into an operational process. Fabric supports both capacity notification emails and

Real-Time Hub / Capacity Overview Events

...

What matters for us

...

for monitoring and alerting. (learn.microsoft.com, learn.microsoft.com)

What we should put in place

...

For every production capacity:
- one named operational owner

...

For Data Platform Core capacity

mandatory alerting for approach to throttling
alerts routed to central platform operations

For Domain production capacity

mandatory alerting as well
alerts should trigger investigation of the top consumer workspace or item
repeated alerts should lead to either threshold tuning, workload optimization, or workspace isolation.

Recommended implementation path

Option 1 — Simple baseline
Use capacity email notifications for a first layer of monitoring. These are configured by a capacity admin in capacity settings.

...

- one shared distribution list or team channel

...

- clear alert thresholds
- one documented escalation path. (learn.microsoft.com)

Suggested implementation path

Minimum setup

enable capacity notification emails

Recommended target

implement alerts using

Fabric Capacity Overview Events in Real-Time Hub

...

. (learn.microsoft.com)

Info

title	Key message

Alerts should be mandatory on all production capacities. A production capacity without an owner and alerting is not operationally governed.

Example

Image Added

3. Capacity Overage

What to understand

Suggested policy text

Every production Fabric capacity must have an assigned operational owner, active alerting, and a documented escalation path. Real-Time Hub alerts should be the preferred target implementation for capacity health monitoring.

5. Capacity Overage

...

Capacity Overage allows Fabric to use extra compute beyond the purchased

...

limit to

...

avoid throttling. Microsoft positions it as a way to absorb rare unexpected spikes or small regular peaks, not as a substitute for proper sizing. It is available only

...

on F SKUs

...

. (learn.microsoft.com, learn.microsoft.com)

What it is good for	What it is not
reducing the risk of disruption during occasional overload protecting critical production continuity avoiding throttling for short, unexpected peaks. (learn.microsoft.com)	not a performance booster not a strategy for permanent under-sizing not something to leave effectively unlimited. (learn.microsoft.com)

What it does well

Microsoft positions Capacity Overage as a way to handle rare unexpected spikes or small regular spikes where scaling up is not otherwise required. It helps prevent throttling and allows new jobs to run, reducing downstream user impact.

Important limitations

...

What we should put in place
For Data Platform Core

...

:

...

enable only if uptime is critical

...

cap the limit
require

...

platform

...

and budget owner approval
review every overage event

...

For Domain production

...

:

use only if the business accepts the cost model
keep tighter limits

...

do not use

...

it to hide repeated

...

Clear decision rule

Enable Capacity Overage when:

the capacity is business-critical
overload is occasional, not structural
the financial model is accepted
there is active monitoring behind it.

Do not enable it as the default response to recurring saturation. In that case, the right answer is usually resize, optimize, or isolate workloads.

Suggested policy text

Capacity Overage may be enabled on critical Fabric capacities as a controlled resilience mechanism. It must remain capped, monitored, and financially approved. It must not replace correct sizing or workload isolation.

6. Recommended Actions for Our Platform

Feature	Data Platform Core capacity	Domain production capacity	What to implement
Surge Protection	Secondary control only	Yes	Enable mainly on Domain capacity, tune with Metrics App, use workspace-level controls
Notifications / Alerts	Mandatory	Mandatory	Define owner, recipients, thresholds, escalation path
Capacity Overage	Optional, capped	Optional, tightly capped	Use only for rare peaks and with budget approval

This operating model aligns with Microsoft’s guidance: isolate critical workloads first, use Surge Protection to protect shared interactive workloads, use alerts for active monitoring, and use Capacity Overage as a safety net rather than a normal operating mode.

7. Checklist

Is the Data Platform Core workspace on a dedicated capacity?
Is Surge Protection enabled and tuned on Domain production capacity?
Is workspace-level Surge Protection enabled for Domain workspaces?
Are Mission Critical workspaces explicitly limited and documented?
Is there a named operational owner for each production capacity?
Are capacity alerts configured for both Core and Domain capacities?
Are Real-Time Hub alerts planned or implemented?
Is Capacity Overage enabled only where uptime justifies it?
Is every overage limit capped and financially approved?
Is each alert or overage event reviewed as part of run operations?

8. Final Recommendation

...

saturation. (learn.microsoft.com)

Info

title	Key message

Use Capacity Overage as a controlled safety net, not as a normal operating model.

Example

Image Added

...

Page tree

Versions Compared

Old Version 3

New Version Current

Key

3. Surge Protection

What it is

What matters for us

Key message

What we should implement

Decision guide

Enable Surge Protection when

Make Notifications mandatory when

Enable Capacity Overage when

Quick checklist

1. Surge Protection

What to understand

Important limitations

Workspace-level control

Suggested policy text

4. Notifications and Alerts

What it is

2. Notifications / Alerts

What to understand

What matters for us

Recommended implementation path

Suggested implementation path

3. Capacity Overage

What to understand

Suggested policy text

5. Capacity Overage

Capacity Overage allows Fabric to use extra compute beyond the purchased

What it does well

Important limitations

Clear decision rule

Suggested policy text

6. Recommended Actions for Our Platform

7. Checklist

8. Final Recommendation

Page tree

Page History

Versions Compared

Old Version 3

New Version Current

Key

3. Surge Protection

What it is

What matters for us

Key message

What we should implement

Decision guide

Enable Surge Protection when

Make Notifications mandatory when

Enable Capacity Overage when

Quick checklist

1. Surge Protection

What to understand

Important limitations

Workspace-level control

Suggested policy text

4. Notifications and Alerts

What it is

2. Notifications / Alerts

What to understand

What matters for us

Recommended implementation path

Suggested implementation path

3. Capacity Overage

What to understand

Suggested policy text

5. Capacity Overage

Capacity Overage allows Fabric to use extra compute beyond the purchased

What it does well

Important limitations

Clear decision rule

Suggested policy text

6. Recommended Actions for Our Platform

7. Checklist

8. Final Recommendation