Logging Best Practices: Structured, Sampled, Searchable
Logging Best Practices: Structured, Sampled, Searchable
Logs are the cheapest insurance against future you at 3 AM. Here is how to make them actually useful.
1. Always structured
Plain text logs are unsearchable at scale. Use JSON:
{
"ts": "2025-12-03T10:00:00.123Z",
"level": "warn",
"msg": "rate_limit_exceeded",
"service": "api",
"request_id": "01HXY...",
"user_id": 42,
"limit": 100,
"current": 117
}
Pino (Node), structlog (Python), zap (Go), tracing (Rust) all do this.
2. Use log levels deliberately
| Level | Use for |
|---|---|
error | Things that paged someone or lost data |
warn | Anomalies that auto-recovered |
info | Business events (order_placed, email_sent) |
debug | Developer troubleshooting (off in prod) |
Avoid the temptation to log every line of code. Each info line costs money at scale.
3. Correlation IDs everywhere
Generate a request ID at the edge (X-Request-ID or trace ID), propagate it through every downstream call, and log it on every line. Without it, distributed traces are a guessing game.
4. Redact PII at the source
Never log raw passwords, tokens, full credit cards or healthcare data. Use a serializer that strips fields:
pino({
redact: ["req.headers.authorization", "user.password", "*.creditCard"]
});
A leaked log file is the cheapest data breach.
5. Sample, but never sample errors
Drop 90% of healthy info logs; keep 100% of warn and error. Most platforms support sampling rules.
6. Log lines should be one event
// Bad
log.info("starting payment");
log.info("called stripe");
log.info("got response");
log.info("done");
// Good
log.info({ event: "payment_charged", duration_ms: 187, amount: 19.99, status: "ok" });
One line, one verb, all the context.
7. Don’t log what metrics can show
- Request counts → counter metric.
- Durations → histogram.
- Cache hits → gauge.
Logs are for context; metrics are for trends. Mixing them is expensive.
8. Set retention by level
error→ 90 days, indexed.warn→ 30 days, indexed.info→ 7 days, hot; 30 days cold (S3 + Athena / Loki).debug→ off in prod, on with feature flag.
TL;DR
Structured JSON, correlation IDs, redaction, sensible levels, sampling, and tiered retention. Apply this checklist and your monthly log bill drops while your incident MTTR also drops.
Found this helpful? Try our free tools!
Explore Our Tools →