Logging Best Practices: Structured, Sampled, Searchable

Logs are the cheapest insurance against future you at 3 AM. Here is how to make them actually useful.

1. Always structured

Plain text logs are unsearchable at scale. Use JSON:

{
  "ts": "2025-12-03T10:00:00.123Z",
  "level": "warn",
  "msg": "rate_limit_exceeded",
  "service": "api",
  "request_id": "01HXY...",
  "user_id": 42,
  "limit": 100,
  "current": 117
}

Pino (Node), structlog (Python), zap (Go), tracing (Rust) all do this.

2. Use log levels deliberately

Level	Use for
`error`	Things that paged someone or lost data
`warn`	Anomalies that auto-recovered
`info`	Business events (`order_placed`, `email_sent`)
`debug`	Developer troubleshooting (off in prod)

Avoid the temptation to log every line of code. Each info line costs money at scale.

3. Correlation IDs everywhere

Generate a request ID at the edge (X-Request-ID or trace ID), propagate it through every downstream call, and log it on every line. Without it, distributed traces are a guessing game.

4. Redact PII at the source

Never log raw passwords, tokens, full credit cards or healthcare data. Use a serializer that strips fields:

pino({
  redact: ["req.headers.authorization", "user.password", "*.creditCard"]
});

A leaked log file is the cheapest data breach.

5. Sample, but never sample errors

Drop 90% of healthy info logs; keep 100% of warn and error. Most platforms support sampling rules.

6. Log lines should be one event

// Bad
log.info("starting payment");
log.info("called stripe");
log.info("got response");
log.info("done");

// Good
log.info({ event: "payment_charged", duration_ms: 187, amount: 19.99, status: "ok" });

One line, one verb, all the context.

7. Don’t log what metrics can show

Request counts → counter metric.
Durations → histogram.
Cache hits → gauge.

Logs are for context; metrics are for trends. Mixing them is expensive.

8. Set retention by level

error → 90 days, indexed.
warn → 30 days, indexed.
info → 7 days, hot; 30 days cold (S3 + Athena / Loki).
debug → off in prod, on with feature flag.

TL;DR

Structured JSON, correlation IDs, redaction, sensible levels, sampling, and tiered retention. Apply this checklist and your monthly log bill drops while your incident MTTR also drops.