Why Investigate a Running Go Program

Why Investigate a Running Go Program?

"It works" is not a metric.
A system that behaves unpredictably under load is not a stable system.

Logs Are Not Enough

log.Println("handler started")
// ... 500ms pause ...
log.Println("handler finished")

The delay is visible. The cause is not.

Possibilities include:

Network or database latency
Lock contention
Garbage collection pause
Goroutine scheduling delays

Logs show control flow, not runtime behavior. They miss causality, frequency, duration, and contention.

Debugging ≠ Profiling ≠ Tracing

Each tool serves different diagnostic needs:

Tool	Purpose	Common Use Case
log.Println	Control flow trace	Debugging logic
pprof	Performance profile	CPU/memory hotspots
runtime/trace	Timeline + state trace	Goroutine scheduling, GC
Delve (dlv)	Interactive debugging	Localized issue investigation

These tools serve different goals. In production scenarios, profiling and tracing are primary sources of insight.

While pprof produces snapshots over fixed intervals (e.g. CPU usage over 10s), tools like runtime/trace offer time-aligned event streams and scheduler state transitions. Both are essential for different classes of issues.

Typical Use Cases for Instrumentation

CPU consistently saturated
Latency degradation under load
Memory consumption growth (suspected leak)
High goroutine count with little throughput
Irregular behavior after deployment or scaling

Such issues require observation beyond code-level debugging.

Tool Proficiency Matters

A flamegraph shows samples. It does not explain causality.

Understanding the difference between flat and cumulative time, allocation vs in-use space, wall time vs CPU time — is critical.

Trace analysis requires attention to event causality, system threads, scheduler delays. Delve is useful only when used with a specific hypothesis.

Instrumentation without interpretation is noise.

Scope of This Series

Topics include:

Built-in Go tooling: pprof, trace, runtime
Heap, goroutine, block, mutex, and CPU profiling
Reading and understanding profile outputs
Practical flamegraph interpretation
Controlled use of dlv (Delve) for non-trivial bugs
Stack inspection and runtime introspection
External tools: perf, strace, ebpf (for when built-ins are insufficient)

Each article focuses on specific capabilities and real-world usage patterns.

Closing Note

Logs report symptoms. Profiling reveals underlying behavior. Tracing shows time and causality.

Understanding system dynamics under load requires tools beyond logging and intuition.

Next parts

Dissecting a Running Golang Program with pprof