OTel Collector vs vendor agents.
The right answer is not always the OTel Collector, but the wrong one is almost always 'vendor agents only'. Here's where each fits, and the migration path away from lock-in when you're already there.
The choice between OpenTelemetry's Collector and a vendor's first-party agent (Datadog Agent, New Relic Infrastructure, Splunk SignalFx, etc.) is one of the most consequential calls you make on an observability stack, and one of the most frequently made by accident. A team adopts Datadog, deploys the agent everywhere, and three years later "swap backends" means a six-month re-instrumentation project.
This guide is the comparison we run on every assessment. The recommendation is not vendor-pure (vendor agents do things the Collector still can't, and pretending otherwise is dogma) but the default lean is unambiguous.
What the OTel Collector actually does
The OpenTelemetry Collector is a vendor-neutral receiver, processor, and exporter for traces, metrics, and logs. Telemetry from your applications (via OTel SDKs or any compatible source) lands at the Collector; the Collector optionally enriches, samples, transforms, or routes it; then exports to one or more backends.
The killer feature isn't any one of those steps. It's that the Collector is the abstraction layer. Your applications are instrumented once, against the OTel API. Where the data goes is configuration, not code. Swapping from Datadog to Grafana Cloud means changing an exporter config, not redeploying every service.
What vendor agents do that the Collector still doesn't
Vendor agents earned their place. They bundle a lot. A Datadog Agent, for example, ships with:
- Auto-instrumentation for popular runtimes that's deeper than OTel's auto-instrumentation in some languages.
- Integrations for hundreds of common services (databases, queues, SaaS platforms) that work out of the box.
- Process-level visibility, container introspection, and host-level diagnostics that go beyond the OTel host receiver.
- Vendor-specific advanced features: APM service maps, watchdog anomaly detection, log pattern analysis.
- A single binary, a single config, a single support contract.
Most of these are catching up in OTel, but "catching up" is the operative phrase. For very specific use cases (deep .NET profiling, certain database introspection, AIOps-style anomaly detection), vendor agents still give you something the Collector doesn't.
The lock-in cost
Here's where the comparison gets real. When you instrument with vendor agents only, every one of those points above becomes a migration cost when you want to leave:
- Auto-instrumentation hooks are vendor-specific; moving means re-instrumenting.
- Vendor integrations are configured in the vendor's UI/IaC; moving means rebuilding the integration set.
- Logs forwarded by the vendor agent often have proprietary formats and tagging, so log queries don't port.
- Service maps, dashboards, and alerts are all defined in the vendor's tooling; they don't move with the data.
On a real assessment we costed a Datadog → Grafana Cloud migration for a mid-sized fintech: ~9 months of platform team work, with an estimated 2× the annual savings before the migration would have paid back. That's the lock-in tax. It's invisible until you try to leave.
Where possible, route telemetry through an OTel Collector rather than vendor-specific agents. The Collector is the abstraction layer that preserves backend portability.
The pragmatic recommendation
Tracefox's default on every engagement:
- OTel Collector as the primary pipeline. Applications instrument against OTel SDKs. Collectors deployed as DaemonSets (Kubernetes) or sidecars (ECS/Fargate). All traces, custom metrics, and structured logs flow through the Collector.
- Vendor agent as a complement, not a replacement. If the vendor offers something materially better in a specific area (deep APM profiling on a critical service; specialised database integration), run their agent alongside the Collector for that specific signal, and isolate the dependency.
- The Collector exports to your chosen backend. One backend by default. Multi-backend export (e.g. send traces to both Tempo and X-Ray during migration) is supported and useful, but adds configuration complexity. Turn it on with intent, not by accident.
The result: you get the best of the vendor's capability where it matters, and you keep the option to leave.
If you're already vendor-agent-only: a migration path
Don't rip and replace. The pattern that works:
- Deploy the OTel Collector alongside the vendor agent. No removal yet. Both run in parallel, the Collector receives nothing initially.
- Pick one signal type, typically traces. Migrate trace generation from vendor SDKs to OTel SDKs. Configure the Collector to receive OTel traces and export them to your existing backend (most vendors accept OTel ingestion now).
- Validate parity, then move metrics. Repeat for custom application metrics: instrument with OTel, route through the Collector, export to the same backend.
- Logs last. Logs are the messiest because of vendor-specific structured-log formats. Often worth keeping the vendor agent for log forwarding for a transitional period.
- Then, and only then, consider backend swap. Once everything flows through the Collector, the backend becomes a configuration change, not a re-instrumentation project.
When to skip OTel
Be honest about the cases where this calculus genuinely flips:
- Very small teams where the operational overhead of running a Collector outweighs the lock-in risk.
- Mature single-vendor environments where the migration cost is justified by the vendor's depth in your specific stack (e.g. heavy .NET shops on AppDynamics circa 2020).
- Air-gapped or compliance-restricted environments where the vendor's pipeline meets a specific certification the Collector doesn't yet hold.
These are real cases. They're also rarer than people pitching them think.
What this looks like applied
The full instrumentation checklist (Collector deployment, SDK initialisation, propagation, log injection, recording rules) is in the downloadable Blueprint. The methodology page covers the rest of what's brought in alongside the Collector: Golden Signals, SLOs, alerts, naming.