Observability 101: Logs, Metrics, and Traces Without the Buzzwords

DataFmt Team
#observability #opentelemetry #monitoring #devops
5 min read

Observability 101: Logs, Metrics, and Traces Without the Buzzwords

Observability is “the ability to ask new questions about your system without shipping new code”. You get there with three pillars.

Logs — what happened

Structured JSON logs with consistent fields:

{ "ts": "2025-07-02T10:00:00Z", "level": "error",
  "msg": "payment_failed", "user_id": 42,
  "trace_id": "a1b2c3", "amount": 19.99 }

Tips:

  • One event per log line. No multi-line messages.
  • Always include the trace ID for correlation.
  • Sample noisy info logs in production; never sample errors.

Metrics — how often, how much

Aggregated counters, gauges, histograms. Cheap to store, fast to query:

http_requests_total{method="POST", status="200"} 12453
http_request_duration_seconds_bucket{le="0.1"} 11900

Use RED (Rate, Errors, Duration) for services and USE (Utilization, Saturation, Errors) for resources.

Traces — the journey of one request

A trace is a tree of spans. Each span has a name, start, duration and attributes:

GET /checkout                       250 ms
├── auth.verify                       8 ms
├── db.query SELECT carts             40 ms
├── http.POST stripe                 180 ms
└── kafka.produce order_created       12 ms

OpenTelemetry is the de facto standard for instrumentation; export to any backend (Jaeger, Tempo, Datadog, Honeycomb).

What it actually costs

PillarCardinalityCost driver
Logsunboundedvolume × retention
Metricsboundedunique label combinations
Tracessampledspans per second

Most teams blow up their bill by adding user_id to a metric label. Don’t.

Where to start

  1. Adopt OpenTelemetry SDK in one service.
  2. Add trace ID to every log.
  3. Build a dashboard with the four golden signals: latency, traffic, errors, saturation.
  4. Alert on symptoms (user-visible), not causes.

Found this helpful? Try our free tools!

Explore Our Tools →