Field notes

Your error budget exists. It just isn't being used.

The SLO is defined. The budget is computed. The dashboard shows it. The team has not, in the last two quarters, frozen a release because of it. Has not, in the last two quarters, prioritised a reliability item over a feature item because of it. The budget is decoration. The decisions are being made by the same instincts as before.

5 March 2027 · Ken Tan · 6 min read

SLOs and error budgets are the most over-implemented and under-used pair of artefacts in the observability stack. Every team I work with has them. Almost no team uses them as a decision-making tool.

The pattern is consistent. The team did the SLO workshop. They picked targets. They set up the dashboards. The error budget is visible, with a percentage remaining for the quarter. And nothing in the team's planning meetings refers to it. The product manager doesn't ask. The engineering manager doesn't bring it up. The release calendar doesn't pause when the budget is exhausted. The artefact exists in parallel to the work, but it doesn't touch the work.

What "using" the budget actually means

The budget is being used when it changes a decision the team would otherwise have made differently. The decisions where it should be load-bearing are usually:

Should we ship this feature this week, or harden the service first? The budget answers this. If you've burned 80% of the quarter's budget in three weeks, the answer is harden. If you've burned 5%, the answer is ship.
Is this a Sev2 or a Sev3? The budget burn rate is one of the inputs. A 1% error rate that consumes a month of budget in an hour is a Sev2. A 1% error rate that's been stable for two weeks is a maintenance ticket.
Should we accept this customer's enterprise SLA? The budget tells you whether your current operational reality can carry the SLA. A team running at the edge of their existing budget cannot, in honesty, accept a stricter one.
Where should the platform team invest next quarter? The services with consistent budget overruns are the ones that need investment. Without the budget, the prioritisation defaults to whoever shouts loudest.

A team where none of those four conversations references the budget has an unused budget, regardless of what the dashboard shows.

Why the budget gets ignored

The reason isn't ignorance. The teams I work with know the budget is there. The reasons it doesn't get used cluster into three:

No policy. The budget exists, but the team has never written down what to do when it's exhausted. Without a policy, the budget being at zero produces a shrug. A policy is the bridge between the metric and the decision.
No leadership air-cover. The first time the policy says "freeze releases," someone has to stand behind that decision against a product manager who wants to ship. If the engineering leader doesn't, the policy collapses on first contact, and after that nobody believes in it.
The wrong SLO target. If the SLO is set unrealistically tight, the budget is always exhausted, and the team learns to ignore it because acting on it would mean perpetual freeze. If the SLO is set unrealistically loose, the budget is never exhausted, and the artefact is performative. Either way, the budget doesn't drive decisions.

The most common of the three is the first. The SLO and budget got set up. The policy never got written. The team is missing the document that turns a metric into a governance object.

What a working policy looks like

The policy that holds up under pressure is short. One page. Five states, with a decision-owner per state. Something like:

Healthy (more than 50% remaining): normal operations. Ship. Plan as usual.
Watch (25–50% remaining): reliability items enter the next sprint planning explicitly. Engineering manager decides what gets prioritised.
Attention (10–25% remaining): next reliability item is mandatory in the current sprint. Director is informed.
Freeze (0–10% remaining): non-critical releases pause. Only fixes that improve reliability ship. Director approves any exception in writing.
Exhausted (negative budget): the policy itself goes to leadership review. Either the SLO is wrong, the operational investment is wrong, or both. Nobody ships until that conversation happens.

The policy is short because every line is a decision. The decision-owner per state matters because in a real burn, the team needs to know who is allowed to call it.

The first time the policy is used

The first invocation of the policy is the most important operational moment in adopting SLOs. It almost always happens awkwardly. The budget gets exhausted on a Wednesday. A product manager has a launch planned for Thursday. The engineering manager is now in a position of either invoking the policy and pausing the launch, or not invoking it and confirming the policy is decoration.

The teams whose budgets are load-bearing are the teams whose leadership backed the engineering manager that Wednesday. The teams whose budgets are decoration are the teams where the engineering manager was overruled, and the policy was never invoked again.

The line worth holding

SLOs without policy are dashboards. Policy without leadership air-cover is paperwork. The error budget is a governance object, and governance only works if the people with authority defer to it on the days that count. Build the policy. Get the leadership pre-commitment in writing. The first time the budget runs out, the system either works or it doesn't, and the team will know which forever.

Your error budget exists. It just isn't being used.

What "using" the budget actually means

Why the budget gets ignored

What a working policy looks like

The first time the policy is used

The line worth holding

Error-budget policy that survives the first P1.

The cargo-culted SLO target.

The retro action items nobody did.

An error budget that doesn't change planning is a dashboard panel, not a governance tool.

What "using" the budget actually means

Why the budget gets ignored

What a working policy looks like

The first time the policy is used

The line worth holding

Error-budget policy that survives the first P1. →

The cargo-culted SLO target. →

The retro action items nobody did. →

An error budget that doesn't change planning is a dashboard panel, not a governance tool.

Error-budget policy that survives the first P1.

The cargo-culted SLO target.

The retro action items nobody did.