Logging Best Practices: Structured, Sampled, Searchable

DataFmt Team
#logging #observability #devops #sre
5 min read

Logging Best Practices: Structured, Sampled, Searchable

Logs are the cheapest insurance against future you at 3 AM. Here is how to make them actually useful.

1. Always structured

Plain text logs are unsearchable at scale. Use JSON:

{
  "ts": "2025-12-03T10:00:00.123Z",
  "level": "warn",
  "msg": "rate_limit_exceeded",
  "service": "api",
  "request_id": "01HXY...",
  "user_id": 42,
  "limit": 100,
  "current": 117
}

Pino (Node), structlog (Python), zap (Go), tracing (Rust) all do this.

2. Use log levels deliberately

LevelUse for
errorThings that paged someone or lost data
warnAnomalies that auto-recovered
infoBusiness events (order_placed, email_sent)
debugDeveloper troubleshooting (off in prod)

Avoid the temptation to log every line of code. Each info line costs money at scale.

3. Correlation IDs everywhere

Generate a request ID at the edge (X-Request-ID or trace ID), propagate it through every downstream call, and log it on every line. Without it, distributed traces are a guessing game.

4. Redact PII at the source

Never log raw passwords, tokens, full credit cards or healthcare data. Use a serializer that strips fields:

pino({
  redact: ["req.headers.authorization", "user.password", "*.creditCard"]
});

A leaked log file is the cheapest data breach.

5. Sample, but never sample errors

Drop 90% of healthy info logs; keep 100% of warn and error. Most platforms support sampling rules.

6. Log lines should be one event

// Bad
log.info("starting payment");
log.info("called stripe");
log.info("got response");
log.info("done");

// Good
log.info({ event: "payment_charged", duration_ms: 187, amount: 19.99, status: "ok" });

One line, one verb, all the context.

7. Don’t log what metrics can show

  • Request counts → counter metric.
  • Durations → histogram.
  • Cache hits → gauge.

Logs are for context; metrics are for trends. Mixing them is expensive.

8. Set retention by level

  • error → 90 days, indexed.
  • warn → 30 days, indexed.
  • info → 7 days, hot; 30 days cold (S3 + Athena / Loki).
  • debug → off in prod, on with feature flag.

TL;DR

Structured JSON, correlation IDs, redaction, sensible levels, sampling, and tiered retention. Apply this checklist and your monthly log bill drops while your incident MTTR also drops.

Found this helpful? Try our free tools!

Explore Our Tools →