Observability 101: Logs, Metrics, and Traces Without the Buzzwords
Observability 101: Logs, Metrics, and Traces Without the Buzzwords
Observability is “the ability to ask new questions about your system without shipping new code”. You get there with three pillars.
Logs — what happened
Structured JSON logs with consistent fields:
{ "ts": "2025-07-02T10:00:00Z", "level": "error",
"msg": "payment_failed", "user_id": 42,
"trace_id": "a1b2c3", "amount": 19.99 }
Tips:
- One event per log line. No multi-line messages.
- Always include the trace ID for correlation.
- Sample noisy
infologs in production; never sample errors.
Metrics — how often, how much
Aggregated counters, gauges, histograms. Cheap to store, fast to query:
http_requests_total{method="POST", status="200"} 12453
http_request_duration_seconds_bucket{le="0.1"} 11900
Use RED (Rate, Errors, Duration) for services and USE (Utilization, Saturation, Errors) for resources.
Traces — the journey of one request
A trace is a tree of spans. Each span has a name, start, duration and attributes:
GET /checkout 250 ms
├── auth.verify 8 ms
├── db.query SELECT carts 40 ms
├── http.POST stripe 180 ms
└── kafka.produce order_created 12 ms
OpenTelemetry is the de facto standard for instrumentation; export to any backend (Jaeger, Tempo, Datadog, Honeycomb).
What it actually costs
| Pillar | Cardinality | Cost driver |
|---|---|---|
| Logs | unbounded | volume × retention |
| Metrics | bounded | unique label combinations |
| Traces | sampled | spans per second |
Most teams blow up their bill by adding user_id to a metric label. Don’t.
Where to start
- Adopt OpenTelemetry SDK in one service.
- Add trace ID to every log.
- Build a dashboard with the four golden signals: latency, traffic, errors, saturation.
- Alert on symptoms (user-visible), not causes.
Found this helpful? Try our free tools!
Explore Our Tools →