What we've actually built.
Sanitised, founder-approved case studies from real engagements. Names redacted where the contract requires it; numbers are not.
From 98.4% to five-nines on a Tier-1 clearing engine.
Bulkheaded a monolith, redesigned the trace surface, and ran two paired P1 drills. Zero unplanned downtime in the 9 months since handover.
- MTTR
- 4.2hr → 14m
- Cost
- −58%
- Duration
- 14 wk
Cutting a $2.1M Datadog bill by two-thirds without losing fidelity.
Cardinality refactor + OTel collector pipelines. Saved more than the engagement cost in the first quarter.
- Saved/yr
- $1.4M
- Cardinality
- −71%
- Duration
- 12 wk
SLOs that actually reflect player experience.
Replaced infra SLIs with journey-based SLOs derived from session telemetry. Error budget became a product-team conversation, not an SRE one.
- SLO coverage
- 12 → 86%
- Page volume
- −74%
- Duration
- 8 wk
HIPAA-clean observability without sampling user PHI.
Designed a redaction pipeline at the OTel collector layer; passed audit with zero PHI in non-clinical telemetry stores.
- Audit findings
- 0
- Coverage
- 100%
- Duration
- 10 wk
Killing 800k unused metrics on Black Friday eve.
A cardinality emergency. We bought the team three quarters of headroom in eight days. Then went back and did it properly.
- Series cut
- −81%
- Latency
- p99 −38%
- Duration
- 2 wk
Vendor-locked to multi-vendor in 11 weeks.
Migrated trace ingestion off a proprietary agent stack onto OpenTelemetry without losing a single in-flight query.
- Vendors
- 1 → 3
- Lock-in cost
- Eliminated
- Duration
- 11 wk
32 further engagements not displayed; outcomes available under NDA on request.
Want to be the next case study?
We publish anonymised numbers from every engagement. Many clients let us name them.