Benchmarking
Volt includes a comprehensive benchmark suite that measures the performance of sync primitives, channels, and async task coordination. The benchmarks use median-based statistics for robustness against outliers.
Running benchmarks
Section titled “Running benchmarks”Quick start
Section titled “Quick start”# Build and run the full benchmark suitezig build bench
# Machine-readable JSON output (used by the comparison tool)zig build bench -- --jsonBoth the benchmark binary and the Volt library module it imports are compiled with ReleaseFast optimization regardless of the build mode you specify. The library uses a dedicated volt_bench_mod with hardcoded .optimize = .ReleaseFast to ensure the channel and scheduler hot paths are fully optimized.
Available build steps
Section titled “Available build steps”| Command | Description |
|---|---|
zig build bench | Run the Volt benchmark suite |
zig build bench -- --json | JSON output for automated processing |
zig build compare | Side-by-side comparison with Tokio |
Benchmark structure
Section titled “Benchmark structure”The benchmark suite (bench/volt_bench.zig) is organized into four tiers that measure progressively more complex operations:
Tier 1: Sync fast path (1M ops)
Section titled “Tier 1: Sync fast path (1M ops)”Single-thread, no runtime. Measures raw data-structure cost — CAS, atomics, memory barriers.
| Benchmark | Operation |
|---|---|
| Mutex | tryLock() / unlock() |
| RwLock (read) | tryReadLock() / readUnlock() |
| RwLock (write) | tryWriteLock() / writeUnlock() |
| Semaphore | tryAcquire(1) / release(1) |
| OnceCell get | get() on an initialized cell |
| OnceCell set | set() on a fresh cell each time |
| Notify | notifyOne() + waitWith() |
| Barrier | waitWith() on barrier(1) |
Tier 2: Channel fast path (100K ops)
Section titled “Tier 2: Channel fast path (100K ops)”Single-thread, no runtime. Measures channel buffer operations without scheduling overhead.
| Benchmark | Operation |
|---|---|
| Channel send | trySend() to fill buffer |
| Channel recv | tryRecv() to drain buffer |
| Channel roundtrip | trySend() + tryRecv() on cap=1 |
| Oneshot | send() + tryRecv() |
| Broadcast | 1 sender, 4 receivers |
| Watch | send() + borrow() |
Tier 3: Async multi-task (10K ops)
Section titled “Tier 3: Async multi-task (10K ops)”Full async runtime with multiple workers. Measures real-world contention: scheduling, waking, and backpressure.
| Benchmark | Operation |
|---|---|
| MPMC | 4 producers + 4 consumers, buffer=1024 |
| Mutex contended | 4 tasks contending on one mutex |
| RwLock contended | 4 readers + 2 writers |
| Semaphore contended | 8 tasks, 2 permits |
Tier 4: Task scheduling (1K ops)
Section titled “Tier 4: Task scheduling (1K ops)”Full async runtime. Measures the scheduler’s core task lifecycle: spawn overhead, batch throughput, and blocking pool round-trip.
| Benchmark | Operation |
|---|---|
| Spawn + await | Serial spawn() + await() of a noop task (scheduler round-trip) |
| Spawn batch | Spawn 100 noop tasks + await all (measures throughput, ~ns per task) |
| Blocking spawn | spawnBlocking() + wait for a noop (blocking pool overhead) |
Configuration
Section titled “Configuration”All configuration constants are defined at the top of bench/volt_bench.zig and must match bench/rust_bench/src/main.rs exactly for fair comparison:
const SYNC_OPS: usize = 1_000_000; // Tier 1const CHANNEL_OPS: usize = 100_000; // Tier 2const ASYNC_OPS: usize = 10_000; // Tier 3const ITERATIONS: usize = 10;const WARMUP: usize = 5;const NUM_WORKERS: usize = 4;Reading benchmark output
Section titled “Reading benchmark output”Text output
Section titled “Text output” TIER 1: SYNC FAST PATH (1000000 ops, single-thread, try_*) ------------------------------------------------------------------------ Mutex tryLock/unlock 6.6 ns/op 151515152 ops/sec RwLock tryReadLock/readUnlock 6.7 ns/op 149253731 ops/sec- ns/op: Nanoseconds per operation (lower is better). Computed as
median_ns / ops_per_iter. - ops/sec: Operations per second (higher is better). Computed as
1e9 / ns_per_op.
JSON output
Section titled “JSON output”The --json flag produces structured output with additional fields:
{ "metadata": { "runtime": "Volt", "sync_ops": 1000000, "iterations": 10, "warmup": 5, "workers": 4 }, "benchmarks": { "mutex": { "ns_per_op": 6.60, "ops_per_sec": 151515152, "median_ns": 6600000, "min_ns": 6500000, "ops_per_iter": 1000000, "bytes_per_op": 0.00, "allocs_per_op": 0.0000 } }}- bytes_per_op: Heap bytes allocated per operation (tracked by a counting allocator wrapper).
- allocs_per_op: Number of heap allocation calls per operation.
- median_ns: Median of all measured iterations (the primary statistic).
- min_ns: Minimum across all iterations (useful for detecting lower bounds).
Statistics methodology
Section titled “Statistics methodology”Each benchmark runs WARMUP (5) discarded iterations followed by ITERATIONS (10) measured iterations. The reported ns/op uses the median of the measured iterations, which is robust against GC pauses, background processes, and other outliers.
Allocation tracking
Section titled “Allocation tracking”The benchmark suite wraps the allocator with a CountingAllocator that atomically tracks total bytes and allocation count. This is reset before each timed iteration, so the reported bytes/op and allocs/op reflect a single run:
const CountingAllocator = struct { inner: Allocator, bytes_allocated: std.atomic.Value(usize), alloc_count: std.atomic.Value(usize), // ...};This allows measuring the allocation overhead of each primitive without modifying the primitive code itself.
Writing new benchmarks
Section titled “Writing new benchmarks”To add a new benchmark:
- Add a function in
bench/volt_bench.zigfollowing the existing pattern:
fn benchMyPrimitive() BenchResult { var stats = Stats{};
// Warmup for (0..WARMUP) |_| { // Run the operation SYNC_OPS times }
// Measured iterations for (0..ITERATIONS) |_| { alloc_counter.reset(); const start = std.time.nanoTimestamp();
for (0..SYNC_OPS) |_| { // Your operation here }
stats.add(@intCast(std.time.nanoTimestamp() - start)); }
return .{ .stats = stats, .ops_per_iter = SYNC_OPS, .total_bytes = alloc_counter.getAllocBytes(), .total_allocs = alloc_counter.getAllocCount(), };}-
Call it from
main()and add it to both the text and JSON output sections. -
If comparing with Tokio, add a matching benchmark in
bench/rust_bench/src/main.rswith the same operation count and configuration.
Best practices
Section titled “Best practices”- Use
std.mem.doNotOptimizeAway()to prevent the compiler from eliding work. - Always include warmup iterations to stabilize caches and branch predictors.
- For Tier 3 (async) benchmarks, create the runtime once and reuse it across iterations.
- Match the Tokio benchmark exactly: same operation count, same thread count, same buffer sizes.
Profiling
Section titled “Profiling”macOS (Instruments)
Section titled “macOS (Instruments)”Build the benchmark binary and profile with Instruments:
# Build the benchmarkzig build bench
# Profile with Instruments (Time Profiler)xcrun xctrace record --template 'Time Profiler' \ --launch zig-out/bench/volt_bench
# Or use the Instruments GUIopen zig-out/bench/volt_bench # Then attach with InstrumentsLinux (perf)
Section titled “Linux (perf)”# Build the benchmarkzig build bench
# Record perf dataperf record -g zig-out/bench/volt_bench
# Analyzeperf reportFlamegraphs
Section titled “Flamegraphs”perf record -F 99 -g zig-out/bench/volt_benchperf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg- Profile with
ReleaseFast(the default for benchmarks) to see realistic hot paths. - Focus on Tier 3 benchmarks for scheduling-related bottlenecks.
- Compare flamegraphs before and after optimization to verify the change addressed the right hot path.
- The
debug_schedulerflag inScheduler.zigcan be set totruefor detailed tracing, but this significantly slows execution.