Skip to content
v1.0.0-zig0.15.2

Benchmarking

Volt includes a comprehensive benchmark suite that measures the performance of sync primitives, channels, and async task coordination. The benchmarks use median-based statistics for robustness against outliers.

Terminal window
# Build and run the full benchmark suite
zig build bench
# Machine-readable JSON output (used by the comparison tool)
zig build bench -- --json

Both the benchmark binary and the Volt library module it imports are compiled with ReleaseFast optimization regardless of the build mode you specify. The library uses a dedicated volt_bench_mod with hardcoded .optimize = .ReleaseFast to ensure the channel and scheduler hot paths are fully optimized.

CommandDescription
zig build benchRun the Volt benchmark suite
zig build bench -- --jsonJSON output for automated processing
zig build compareSide-by-side comparison with Tokio

The benchmark suite (bench/volt_bench.zig) is organized into four tiers that measure progressively more complex operations:

Single-thread, no runtime. Measures raw data-structure cost — CAS, atomics, memory barriers.

BenchmarkOperation
MutextryLock() / unlock()
RwLock (read)tryReadLock() / readUnlock()
RwLock (write)tryWriteLock() / writeUnlock()
SemaphoretryAcquire(1) / release(1)
OnceCell getget() on an initialized cell
OnceCell setset() on a fresh cell each time
NotifynotifyOne() + waitWith()
BarrierwaitWith() on barrier(1)

Single-thread, no runtime. Measures channel buffer operations without scheduling overhead.

BenchmarkOperation
Channel sendtrySend() to fill buffer
Channel recvtryRecv() to drain buffer
Channel roundtriptrySend() + tryRecv() on cap=1
Oneshotsend() + tryRecv()
Broadcast1 sender, 4 receivers
Watchsend() + borrow()

Full async runtime with multiple workers. Measures real-world contention: scheduling, waking, and backpressure.

BenchmarkOperation
MPMC4 producers + 4 consumers, buffer=1024
Mutex contended4 tasks contending on one mutex
RwLock contended4 readers + 2 writers
Semaphore contended8 tasks, 2 permits

Full async runtime. Measures the scheduler’s core task lifecycle: spawn overhead, batch throughput, and blocking pool round-trip.

BenchmarkOperation
Spawn + awaitSerial spawn() + await() of a noop task (scheduler round-trip)
Spawn batchSpawn 100 noop tasks + await all (measures throughput, ~ns per task)
Blocking spawnspawnBlocking() + wait for a noop (blocking pool overhead)

All configuration constants are defined at the top of bench/volt_bench.zig and must match bench/rust_bench/src/main.rs exactly for fair comparison:

const SYNC_OPS: usize = 1_000_000; // Tier 1
const CHANNEL_OPS: usize = 100_000; // Tier 2
const ASYNC_OPS: usize = 10_000; // Tier 3
const ITERATIONS: usize = 10;
const WARMUP: usize = 5;
const NUM_WORKERS: usize = 4;
TIER 1: SYNC FAST PATH (1000000 ops, single-thread, try_*)
------------------------------------------------------------------------
Mutex tryLock/unlock 6.6 ns/op 151515152 ops/sec
RwLock tryReadLock/readUnlock 6.7 ns/op 149253731 ops/sec
  • ns/op: Nanoseconds per operation (lower is better). Computed as median_ns / ops_per_iter.
  • ops/sec: Operations per second (higher is better). Computed as 1e9 / ns_per_op.

The --json flag produces structured output with additional fields:

{
"metadata": {
"runtime": "Volt",
"sync_ops": 1000000,
"iterations": 10,
"warmup": 5,
"workers": 4
},
"benchmarks": {
"mutex": {
"ns_per_op": 6.60,
"ops_per_sec": 151515152,
"median_ns": 6600000,
"min_ns": 6500000,
"ops_per_iter": 1000000,
"bytes_per_op": 0.00,
"allocs_per_op": 0.0000
}
}
}
  • bytes_per_op: Heap bytes allocated per operation (tracked by a counting allocator wrapper).
  • allocs_per_op: Number of heap allocation calls per operation.
  • median_ns: Median of all measured iterations (the primary statistic).
  • min_ns: Minimum across all iterations (useful for detecting lower bounds).

Each benchmark runs WARMUP (5) discarded iterations followed by ITERATIONS (10) measured iterations. The reported ns/op uses the median of the measured iterations, which is robust against GC pauses, background processes, and other outliers.

The benchmark suite wraps the allocator with a CountingAllocator that atomically tracks total bytes and allocation count. This is reset before each timed iteration, so the reported bytes/op and allocs/op reflect a single run:

const CountingAllocator = struct {
inner: Allocator,
bytes_allocated: std.atomic.Value(usize),
alloc_count: std.atomic.Value(usize),
// ...
};

This allows measuring the allocation overhead of each primitive without modifying the primitive code itself.

To add a new benchmark:

  1. Add a function in bench/volt_bench.zig following the existing pattern:
fn benchMyPrimitive() BenchResult {
var stats = Stats{};
// Warmup
for (0..WARMUP) |_| {
// Run the operation SYNC_OPS times
}
// Measured iterations
for (0..ITERATIONS) |_| {
alloc_counter.reset();
const start = std.time.nanoTimestamp();
for (0..SYNC_OPS) |_| {
// Your operation here
}
stats.add(@intCast(std.time.nanoTimestamp() - start));
}
return .{
.stats = stats,
.ops_per_iter = SYNC_OPS,
.total_bytes = alloc_counter.getAllocBytes(),
.total_allocs = alloc_counter.getAllocCount(),
};
}
  1. Call it from main() and add it to both the text and JSON output sections.

  2. If comparing with Tokio, add a matching benchmark in bench/rust_bench/src/main.rs with the same operation count and configuration.

  • Use std.mem.doNotOptimizeAway() to prevent the compiler from eliding work.
  • Always include warmup iterations to stabilize caches and branch predictors.
  • For Tier 3 (async) benchmarks, create the runtime once and reuse it across iterations.
  • Match the Tokio benchmark exactly: same operation count, same thread count, same buffer sizes.

Build the benchmark binary and profile with Instruments:

Terminal window
# Build the benchmark
zig build bench
# Profile with Instruments (Time Profiler)
xcrun xctrace record --template 'Time Profiler' \
--launch zig-out/bench/volt_bench
# Or use the Instruments GUI
open zig-out/bench/volt_bench # Then attach with Instruments
Terminal window
# Build the benchmark
zig build bench
# Record perf data
perf record -g zig-out/bench/volt_bench
# Analyze
perf report
Terminal window
perf record -F 99 -g zig-out/bench/volt_bench
perf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg
  • Profile with ReleaseFast (the default for benchmarks) to see realistic hot paths.
  • Focus on Tier 3 benchmarks for scheduling-related bottlenecks.
  • Compare flamegraphs before and after optimization to verify the change addressed the right hot path.
  • The debug_scheduler flag in Scheduler.zig can be set to true for detailed tracing, but this significantly slows execution.