YURKOL Ltd - Custom Software and Cloud Architectural Solutions

Why Investigate a Running Go Program?

"It works" is not a metric.
A system that behaves unpredictably under load is not a stable system.

Logs Are Not Enough

log.Println("handler started")
// ... 500ms pause ...
log.Println("handler finished")
                

The delay is visible. The cause is not.

Possibilities include:

Logs show control flow, not runtime behavior. They miss causality, frequency, duration, and contention.


Debugging ≠ Profiling ≠ Tracing

Each tool serves different diagnostic needs:

Tool Purpose Common Use Case
log.Println Control flow trace Debugging logic
pprof Performance profile CPU/memory hotspots
runtime/trace Timeline + state trace Goroutine scheduling, GC
Delve (dlv) Interactive debugging Localized issue investigation

These tools serve different goals. In production scenarios, profiling and tracing are primary sources of insight.

While pprof produces snapshots over fixed intervals (e.g. CPU usage over 10s), tools like runtime/trace offer time-aligned event streams and scheduler state transitions. Both are essential for different classes of issues.


Typical Use Cases for Instrumentation

Such issues require observation beyond code-level debugging.


Tool Proficiency Matters

A flamegraph shows samples. It does not explain causality.

Understanding the difference between flat and cumulative time, allocation vs in-use space, wall time vs CPU time — is critical.

Trace analysis requires attention to event causality, system threads, scheduler delays. Delve is useful only when used with a specific hypothesis.

Instrumentation without interpretation is noise.


Scope of This Series

Topics include:

Each article focuses on specific capabilities and real-world usage patterns.


Closing Note

Logs report symptoms. Profiling reveals underlying behavior. Tracing shows time and causality.

Understanding system dynamics under load requires tools beyond logging and intuition.


Next parts