Ad-hoc
Telemetry exists because someone enabled an agent. Nobody can describe the data hierarchy. Pages mostly noise; MTTR unpredictable.
- Top-10 dashboards 18+ months old
- Bill grew 40%+ year-on-year
- Most alerts silenced or auto-resolved
Observability & SRE · Singapore
A Singapore observability and SRE practice. We rebuild telemetry, SLOs, and incident response on systems that have outgrown the dashboards they started with.
A 12-axis assessment that scores the seven structural dimensions of your observability stack on evidence. The panel on the right is the actual artefact from a recent anonymised engagement — read it the way a client reads it.
Bottlenecked by alerting quality, incident response, and SLO maturity · all scoring 1/4.
Maturity ladder
Each dimension above scores 0–4. Your tier is the worst-of axes — observability is bottlenecked by the weakest signal, not averaged across the group. The four states below are where systems land in practice.
Telemetry exists because someone enabled an agent. Nobody can describe the data hierarchy. Pages mostly noise; MTTR unpredictable.
Pages get answered. Dashboards exist. But the design is inherited rather than chosen, and cardinality grows in step with revenue.
Data hierarchy is intentional. SLOs reflect customer journeys. Cardinality has a budget. Cost-per-signal is a tracked metric.
Telemetry is treated as a first-class surface. Engineers check it before shipping, and the dashboard count trends down rather than up.
Most teams we audit start at T1. Foundational moves you to T2 in 12 weeks.
Fintech · APAC · clearing engine
MTTR · min
−94%
Logistics · APAC · platform
Annual spend · $k
−65%
Fintech, logistics, gaming, healthcare, e-commerce, banking. Numbers redacted; outcomes are not.
Names withheld under NDA · sectors and metrics anonymised at client request.
I'm Ken. I started Tracefox after enough years inside platform teams watching telemetry budgets compound while incidents stayed slow. Almost every observability problem I see is upstream of the tools — missing definitions, no shared vocabulary, dashboards nobody owns. That's what we fix first. Everything else is configuration.
Every service we manage emits all four Golden Signals. Every user-facing journey has at least one SLO. Every SLO has a burn-rate alert. Anything beyond that is improvement, not baseline.
Diagnostic · 2 wk · USD 18,000 fixed
Maturity profile, telemetry redesign, and a 90-day roadmap. Same price in Singapore, San Francisco, or Sydney.