Why Investigate a Running Go Program?
"It works" is not a metric.
A system that behaves unpredictably under load is not a stable system.
Logs Are Not Enough
log.Println("handler started")
// ... 500ms pause ...
log.Println("handler finished")
The delay is visible. The cause is not.
Possibilities include:
- Network or database latency
- Lock contention
- Garbage collection pause
- Goroutine scheduling delays
Logs show control flow, not runtime behavior. They miss causality, frequency, duration, and contention.
Debugging ≠ Profiling ≠ Tracing
Each tool serves different diagnostic needs:
Tool | Purpose | Common Use Case |
---|---|---|
log.Println | Control flow trace | Debugging logic |
pprof | Performance profile | CPU/memory hotspots |
runtime/trace | Timeline + state trace | Goroutine scheduling, GC |
Delve (dlv) | Interactive debugging | Localized issue investigation |
These tools serve different goals. In production scenarios, profiling and tracing are primary sources of insight.
While pprof produces snapshots over fixed intervals (e.g. CPU usage over 10s), tools like runtime/trace offer time-aligned event streams and scheduler state transitions. Both are essential for different classes of issues.
Typical Use Cases for Instrumentation
- CPU consistently saturated
- Latency degradation under load
- Memory consumption growth (suspected leak)
- High goroutine count with little throughput
- Irregular behavior after deployment or scaling
Such issues require observation beyond code-level debugging.
Tool Proficiency Matters
A flamegraph shows samples. It does not explain causality.
Understanding the difference between flat and cumulative time, allocation vs in-use space, wall time vs CPU time — is critical.
Trace analysis requires attention to event causality, system threads, scheduler delays. Delve is useful only when used with a specific hypothesis.
Instrumentation without interpretation is noise.
Scope of This Series
Topics include:
- Built-in Go tooling: pprof, trace, runtime
- Heap, goroutine, block, mutex, and CPU profiling
- Reading and understanding profile outputs
- Practical flamegraph interpretation
- Controlled use of dlv (Delve) for non-trivial bugs
- Stack inspection and runtime introspection
- External tools: perf, strace, ebpf (for when built-ins are insufficient)
Each article focuses on specific capabilities and real-world usage patterns.
Closing Note
Logs report symptoms. Profiling reveals underlying behavior. Tracing shows time and causality.
Understanding system dynamics under load requires tools beyond logging and intuition.