Case study · 038 · Logistics

Cutting a $2.1M Datadog bill by two-thirds without losing fidelity.

A regional logistics platform on a runaway cardinality curve. We saved more than the engagement cost in the first quarter.

Industry
Logistics · APAC
Vendor
Datadog
Engagement
Foundational · 12 wk
Stack
Go · k8s · OTel
Problem statement

Six months from a budget veto. Three vendors all proposing "an enterprise tier."

The platform team had quietly added per-user cardinality dimensions to half their hot metrics over 18 months. The bill went from $40k/mo to $175k/mo. The reaction from finance was not subtle.

  • Time-series active2.4M
  • Used in alerts/dashboards≈460k (19%)
  • Vendor proposalUpgrade to "Pro Plus" + DPM addon
Engagement contract
  • Goal −60% bill, no fidelity loss
  • Timeline 12 weeks
  • Out of scope Vendor migration
  • Success metric $/active-series
  • Reviewer VP Eng + CFO
01 What we did

Three moves. In order.

Phase · 01

Inventory the metrics that earn their keep.

Mapped every series to an alert, dashboard, or SLO. Anything unreferenced for 90 days got a deletion candidate flag. 71% of cardinality fell into that bucket.

Phase · 02

Move noisy dimensions to logs.

Per-user IDs and order IDs were carrying the cardinality. They moved to structured logs (queryable, cheaper). Metrics kept the dimensions that drive on-call decisions.

Phase · 03

Re-route through an OTel collector we own.

Inserted a self-hosted OTel collector before the vendor agent. Tail-sampling, drop rules, label-set enforcement live in our pipeline — not theirs.

02 Outcome

Numbers, end of quarter 4.

$1.4M trending_up
Annual saving
71 % trending_down
Cardinality cut
−18 % trending_down
Alerting latency
None
Vendor change

¶ Engagement fee was USD 96k. Platform cost reduction in the first quarter post-handover was USD 350k. The next year is gravy.

Client testimony
"Tracefox didn't sell us a tool. They handed back our pipeline and our budget. The new collector is the cleanest piece of infra we own."
— Director of Platform · APAC logistics (name redacted per contract)
Artefacts produced
  • description Cardinality inventory · 2.4M series
  • description OTel collector terraform module
  • description Tail-sampling policy v1
  • description Label-set governance policy
  • description Runbook · cardinality regression
  • description ADR-014 · self-hosted collector
Engagement.start()

Datadog/New Relic bill out of control?

The Diagnostic ($18k) tells you whether a refactor is worth it. If the answer is no, you've still got a useful audit.