The observability maturity framework.
Four tiers, twelve axes. We use this in every paid Diagnostic and we keep the rubric public so you can self-score before you call us.
Four states a system can be in.
Asset pending
Four-panel comparative schematic showing the same simplified service architecture at increasing observability maturity, from T0 (chaotic, no traces, error-red accents) through T1 (partial traces, amber) and T2 (clean trace flows on critical paths, soft blue) to T3 (full coverage with burn-rate gauges, electric blue).
Four-panel comparative schematic on paper-white #f7f9fb. Each panel shows the same simplified service architecture (5 service nodes in a ring + central database, 1px obsidian outlines, no fills) at increasing maturity: T0 (Ad-hoc, error-red #ba1a1a accents) — services drawn but no trace lines, scattered '?' tokens; T1 (Reactive, amber) — partial trace lines between some services, alerts shown as orphan markers; T2 (Operational, soft blue #b3c5ff) — clean trace flows on all critical paths, SLO badges on revenue-bearing nodes; T3 (Proactive, electric blue #0066FF) — full trace coverage plus burn-rate gauge attached to two services. Each panel headed with 'T0 · AD-HOC' through 'T3 · PROACTIVE' in JetBrains Mono caps. Below each panel: 4-segment progress bar matching that panel's score (1/4, 2/4, 3/4, 4/4). 16:5 horizontal ratio. Style: technical blueprint, no fills inside service nodes, just outlines.
/img/maturity-framework/ladder.png Telemetry exists because someone enabled an agent. Nobody can describe the data hierarchy. Alerts are mostly noise.
- arrow_forward Top-10 dashboards are 18+ months old
- arrow_forward Bill grew 40%+ year on year
- arrow_forward Most alerts are silenced or auto-resolved
- arrow_forward MTTR is unpredictable
Pages get answered, dashboards exist, but the architecture is whatever the previous SRE left behind. Cardinality grows with revenue.
- arrow_forward SLOs exist on infra metrics
- arrow_forward Cardinality control by exception, not policy
- arrow_forward Trace coverage spotty across boundaries
- arrow_forward Vendor lock-in growing as a risk line
The data hierarchy is intentional. SLOs reflect customer journeys. Cardinality has a budget. Cost-per-signal is a tracked metric.
- arrow_forward Journey-keyed SLOs in place
- arrow_forward Cardinality budget enforced at collector
- arrow_forward OTel-portable trace surface
- arrow_forward Quarterly observability reviews
Telemetry is product. Engineers consult it before shipping. The team's instinct is to delete dashboards rather than add them.
- arrow_forward Pre-deploy SLO impact reviews
- arrow_forward Self-service runbook + alert library
- arrow_forward Cardinality regressions caught in CI
- arrow_forward Vendor-portable, multi-vendor by choice
The twelve we score against.
Each axis is rated 0–4. Your tier is the worst-of axes — observability is bottlenecked by the weakest signal, not averaged across the group.
| 01 | Telemetry hierarchy | Are signals organised by business meaning or by tool default? |
| 02 | Trace coverage | Does context propagate across service contracts you actually care about? |
| 03 | SLO discipline | Are SLOs keyed to customer outcomes, with a real burn-rate policy? |
| 04 | Cardinality control | Is $/active-series tracked? Is there a budget? |
| 05 | Alert quality | What share of pages are actionable in the first 5 minutes? |
| 06 | On-call ergonomics | Can a fresh on-caller resolve a tier-1 incident solo at week 2? |
| 07 | Retro depth | Do retros end in code changes or just Confluence pages? |
| 08 | Runbook freshness | Have your tier-0 runbooks been edited this quarter? |
| 09 | Vendor portability | How long would a vendor migration take? |
| 10 | Compliance posture | Can your telemetry stack survive audit without scrambling? |
| 11 | Telemetry literacy | Can engineers outside SRE write a useful query? |
| 12 | Operational ownership | Who is paid to make tier decisions when budgets collide? |
Want this scored properly on your stack?
The Diagnostic engagement does this with read access to your telemetry. Two weeks, USD 18k, leaves you with a roadmap regardless.