Dissecting Go Programs with pprof: A Practical Guide

Continuation of "Why Investigate a Running Go Program?"

Go includes a powerful profiler built right into the runtime — pprof. It's lightweight, production-safe, and works with minimal effort. But most articles reduce it to a few CLI commands, leaving engineers to guess how to interpret its output.

This guide takes a different approach. It dissects pprof as a real tool: what it does, how it works, and how to read it.

Let's start with the terminology.

What Is a Profile?

A profile is not a log, and it's not a trace. Go's pprof collects samples: stack traces captured periodically (default: every 10ms). Each sample represents "where the program was" at that moment. For the curious - the IP and SP (instruction pointer's and stack pointer's values are used under the hood).

The more samples a function has, the more CPU time it likely consumed.

Each profile contains:

A list of samples
A stack trace per sample
A weight (e.g. nanoseconds, allocations)
Optional tags

Profiles are saved as compressed protobufs. They're visualized with go tool pprof which is going to be our main CLI tool or tools like GoLand and graphviz.

Why Sampling?

Sampling is a tradeoff. It gives a statistical picture without interrupting execution. Precision is not ideal. A fast function may not appear in the profile at all.

This explains many surprises in practice: why a flamegraph looks empty, or why the hottest functions don't show up.

Case Study: Idle Server

We start with a minimal HTTP server:

package main

import (
	"fmt"
	"net/http"
	"time"

	_ "net/http/pprof"
)

func HelloHandler(w http.ResponseWriter, r *http.Request) {
	resp := fmt.Sprintf("Hello, %s!", r.RemoteAddr)
	_, err := w.Write([]byte(resp))
	if err != nil {
		w.WriteHeader(http.StatusInternalServerError)
		return // !!! Important - prevents multiple writes
	}
}

func main() {
	http.HandleFunc("/", HelloHandler)
	panic(http.ListenAndServe(":8080", nil))
}

Running go tool pprof http://localhost:8080/debug/pprof/profile?seconds=10 gives:

Total samples = 0ms

Because the server is idle. No incoming requests — no CPU activity — no samples.

Even if a request is made, but the handler does nothing (e.g., just sleeps), the profile will still be empty. pprof only records active execution.

BurnCPU and Real Data

We introduce a simple function that eats CPU:

package main

import (
	"fmt"
	"math"
	"net/http"
	"time"

	_ "net/http/pprof"
)

func BurnCPU() {
	start := time.Now()
	for time.Since(start) < 300*time.Millisecond {
		for i := 0; i < 1_000_000; i++ {
			r := math.Sqrt(float64(i))
			r = r / 0.3657
		}
	}
}

func HelloHandler(w http.ResponseWriter, r *http.Request) {
	resp := fmt.Sprintf("Hello, %s!\n", r.RemoteAddr)
	BurnCPU()
	_, err := w.Write([]byte(resp))
	if err != nil {
		w.WriteHeader(http.StatusInternalServerError)
		return // !!! Important - prevents multiple writes
	}
}

func main() {
	http.HandleFunc("/", HelloHandler)
	panic(http.ListenAndServe(":8080", nil))
}

Calling this in a handler guarantees real sampling. Now the profile shows actual numbers.

Reading the Output: top

Run the go tool pprof and send some http request to our webserver.

Once the profile is collected, it will be saved as a file in a give location, and an interactive pprof's prompt is awaiting for the commands

Hit the top command, sample output:


                ~> go tool pprof http://localhost:8080/debug/pprof/profile?seconds=10
Fetching profile over HTTP from http://localhost:8080/debug/pprof/profile?seconds=10
Saved profile in ****.pb.gz
File: ___go_build_profiling
Build ID: 873234d28c5a94d965d9e0475d203fd4c044e42a
Type: cpu
Time: ************
Duration: 10.10s, Total samples = 3.82s (37.82%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 3810ms, 99.74% of 3820ms total
Dropped 17 nodes (cum <= 19.10ms)
      flat  flat%   sum%        cum   cum%
    3760ms 98.43% 98.43%     3810ms 99.74%  main.BurnCPU
      50ms  1.31% 99.74%       50ms  1.31%  runtime.nanotime (inline)
         0     0% 99.74%     3810ms 99.74%  main.HelloHandler
         0     0% 99.74%     3810ms 99.74%  net/http.(*ServeMux).ServeHTTP
         0     0% 99.74%     3820ms   100%  net/http.(*conn).serve
         0     0% 99.74%     3810ms 99.74%  net/http.HandlerFunc.ServeHTTP
         0     0% 99.74%     3810ms 99.74%  net/http.serverHandler.ServeHTTP
         0     0% 99.74%       50ms  1.31%  time.Since
         0     0% 99.74%       50ms  1.31%  time.runtimeNano
(pprof)

Interpretation:

flat: time spent in this function directly
cum (cumulative): time spent in this function and everything it called
sum%: cumulative sum of flat% so far

The golden rule: if sum% reaches 75–90%, you've already found the main contributors. No need to analyze deeper.

Now have a look at 0 0% 99.74% 3810ms 99.74% main.HelloHandler It's flat is zero, while its cum is very high, that implies, that all heavy stuff is done by its callees.

In our case this is (not surprisingly) BurnCPU 3760ms 98.43% 98.43% 3810ms 99.74% main.BurnCPU

that has high values for both flat and cum

N.B.: If a function has flat = 0 but cum > 0, it's not guilty — it's just a dispatcher. It passes control to the real burner.

Misconceptions: top Is Not a Call Stack

This view lists functions by CPU consumption. It's not a call stack. Two functions appearing next to each other doesn't mean they called each other.

peek and the Local Context

Use:

(pprof) peek BurnCPU

You get both callers and callees — who invoked the function, and whom it calls. This is the fastest way to get context around a suspicious function.


                            (pprof) peek main.BurnCPU
Showing nodes accounting for 3.82s, 100% of 3.82s total
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context
----------------------------------------------------------+-------------
                                             3.81s   100% |   main.HelloHandler
     3.76s 98.43% 98.43%      3.81s 99.74%                | main.BurnCPU
                                             0.05s  1.31% |   time.Since
----------------------------------------------------------+-------------

But note: peek's context is not guaranteed to be in actual stack order, especially with goroutines.

Functions That Vanish

If function A calls go B(), and B burns CPU, you'll see B — but not A. Why?

Because A did no work. The go keyword is not a function call — it's a launch. Unless A also does something, it may not appear in the profile at all.

This is critical for understanding flamegraphs: launching something is not the same as executing.

Visualizations: svg and web

To generate graphs:


                (pprof) svg
(pprof) web

These require graphviz installed. You get a visual graph with boxes (functions), edges (calls), and box width (CPU weight).

If your app is simple, the graph is linear. If it's concurrent or complex, you'll see branching. This is the callgraph, not the flamegraph.

Flamegraphs vs Callgraphs

A flamegraph arranges functions as stacked layers: Y-axis is stack depth, X-axis is time weight.

A callgraph is a directed graph: nodes = functions, edges = calls.

Both are useful. Callgraphs are better for control flow. Flamegraphs are better for spotting hot paths.

For an interactive web interface with both views, start a local server:

go tool pprof -http=:8080 profile.pb.gz

This opens http://localhost:8080 with tabs for Graph (callgraph), Flame Graph, Top, and Source views. We leave it to the reader to play with visual representations of the profile data using these commands.

list: Source-Level Breakdown

Use:

(pprof) list BurnCPU

Shows source code line-by-line with performance annotations. Sample output:

ROUTINE ======================== main.BurnCPU in /home/yurii/code/profiling/main.go
     3.76s      3.81s (flat, cum) 99.74% of Total
         .          .     12:func BurnCPU() {
         .          .     13:   start := time.Now()
         .       50ms     14:   for time.Since(start) < 300*time.Millisecond {
     3.76s      3.76s     15:           for i := 0; i < 1_000_000; i++ {
         .          .     16:                   r := math.Sqrt(float64(i))
         .          .     17:                   r = r / 0.3657
         .          .     18:           }
         .          .     19:   }
         .          .     20:}

Each line shows flat time (left column) and cumulative time (right column). Line 15 shows where most CPU time is spent. Great for pinpointing exact hot lines within a function.

weblist: Source-Level Breakdown

(pprof) weblist BurnCPU

Shows source code line-by-line, with annotations. If sampling hit a line, it's marked. Great for locating exact hot code — provided the sources are available.

The output includes disassembly too. You can skip that unless you're digging into low-level performance.

What pprof Can't Show

If a goroutine waits on the network or database, pprof shows nothing. Waiting is not burning.

That's why slow SQL queries or flaky HTTP calls don't appear in CPU profiles. They're not consuming CPU on the Go side.

GoLand Integration

GoLand can open .pprof files and render them as flamegraphs — but only for *_test.go. Profiling live servers isn't supported natively. For production code, use the CLI.

Summary: Practical Advice

Use top to find biggest burners
Ignore functions with flat = 0
Use peek for local stack context
Use svg or web for visual structure
Remember: goroutines launched via go may hide their parent

This was the Part 1 — focused on CPU. The next part will explore memory, allocations, GC, and heap profiles in the same spirit.