Zero-Allocation Waiters
One of the most important performance properties of Volt’s synchronization primitives is that waiting on a mutex, semaphore, or read-write lock requires zero heap allocation. The waiter struct is embedded directly in the future that is waiting, and linked into the primitive’s queue through intrusive list pointers.
This page explains the design, why it matters, and how it works in practice.
The Problem
Section titled “The Problem”Traditional async mutex implementations allocate a waiter node on the heap when a task needs to wait:
// Traditional approach (what Volt does NOT do)fn lockWait(self: *Mutex) !void { const waiter = try allocator.create(WaiterNode); // heap allocation waiter.* = .{ .task = currentTask() }; self.queue.push(waiter); suspend(); allocator.destroy(waiter); // heap free}Every lock acquisition that contends requires malloc and free. Under
heavy contention — hundreds of tasks fighting for the same mutex — this
creates significant allocator pressure:
- Each allocation touches the allocator’s free list (often a global lock)
- Memory fragmentation from many small, short-lived allocations
- Cache pollution from allocator metadata
- Failure mode:
OutOfMemoryduring lock contention
For a runtime targeting millions of operations per second, this overhead is unacceptable.
The Solution: Intrusive Waiters
Section titled “The Solution: Intrusive Waiters”Volt embeds the waiter struct directly inside the future that is waiting. The waiter includes intrusive linked list pointers, so it can be threaded into the primitive’s wait queue without any separate allocation.
The Intrusive Linked List
Section titled “The Intrusive Linked List”The foundation is an intrusive doubly-linked list defined in
src/internal/util/linked_list.zig:
/// Pointers for linking a node into a list./// Embed this in your node struct.pub fn Pointers(comptime T: type) type { return struct { prev: ?*T = null, next: ?*T = null, };}
/// An intrusive doubly-linked list.pub fn LinkedList(comptime T: type, comptime pointers_field: []const u8) type { return struct { head: ?*T = null, tail: ?*T = null, len: usize = 0,
pub fn pushFront(self: *Self, node: *T) void { ... } pub fn pushBack(self: *Self, node: *T) void { ... } pub fn popFront(self: *Self) ?*T { ... } pub fn popBack(self: *Self) ?*T { ... } pub fn remove(self: *Self, node: *T) void { ... } };}Key properties:
- O(1) insertion and removal — no searching
- No allocation — nodes embed their own pointers
- O(1) cancellation — remove a specific node by pointer
- FIFO ordering — pushFront + popBack gives fair queuing
The Semaphore Waiter
Section titled “The Semaphore Waiter”The Semaphore.Waiter struct (from src/sync/Semaphore.zig) is the core
waiter type. It embeds intrusive list pointers and tracks how many permits
are still needed:
pub const Waiter = struct { /// Remaining permits needed (atomic for cross-thread visibility) state: std.atomic.Value(usize),
/// Original number of permits requested (for cancellation math) permits_requested: usize,
/// Waker callback invoked when permits are granted waker: ?WakerFn = null, waker_ctx: ?*anyopaque = null,
/// Intrusive list pointers -- NO HEAP ALLOCATION pointers: Pointers(Waiter) = .{},
/// Debug-mode invocation tracking invocation: InvocationId = .{},};The waiter list in the semaphore:
pub const Semaphore = struct { permits: std.atomic.Value(usize), mutex: std.Thread.Mutex, waiters: LinkedList(Waiter, "pointers"), // Intrusive list};When release() grants permits to waiters, it walks the list and assigns
permits directly. Satisfied waiters are removed from the list and their
waker callbacks are collected into a WakeList for batch invocation after
the mutex is released:
fn releaseToWaiters(self: *Self, num: usize) void { var wake_list: WakeList = .{}; var rem = num;
// Serve waiters from the back (oldest first = FIFO) while (rem > 0) { const waiter = self.waiters.back() orelse break;
const remaining = waiter.state.load(.monotonic); const assign = @min(rem, remaining); waiter.state.store(remaining - assign, .release); rem -= assign;
if (remaining - assign == 0) { _ = self.waiters.popBack(); // Collect waker for batch invocation wake_list.push(.{ .context = ctx, .wake_fn = wf }); } }
// Surplus permits go to the atomic counter if (rem > 0) _ = self.permits.fetchAdd(rem, .release);
self.mutex.unlock(); wake_list.wakeAll(); // Wake AFTER releasing the lock}How Primitives Use Intrusive Waiters
Section titled “How Primitives Use Intrusive Waiters”The Mutex is built on Semaphore(1). Its waiter is a thin wrapper:
pub const Waiter = struct { inner: SemaphoreWaiter,
pub fn init() Self { return .{ .inner = SemaphoreWaiter.init(1) }; }
pub fn isAcquired(self: *const Self) bool { return self.inner.isReady(); }};The LockFuture embeds this waiter as a field:
pub const LockFuture = struct { pub const Output = void;
acquire_future: semaphore_mod.AcquireFuture, // Contains the waiter mutex: *Mutex,
pub fn poll(self: *Self, ctx: *Context) PollResult(void) { return self.acquire_future.poll(ctx); }};Since AcquireFuture itself embeds a Waiter:
pub const AcquireFuture = struct { semaphore: *Semaphore, num_permits: usize, waiter: Waiter, // Embedded, not heap-allocated state: State, stored_waker: ?FutureWaker,};The full chain of embedding means the waiter lives inside the future, which
lives inside the FutureTask, which is the single allocation for the
entire task. Zero additional allocations for waiting.
RwLock
Section titled “RwLock”The RwLock is built on Semaphore(MAX_READS) where MAX_READS is ~536
million. Read locks acquire 1 permit; write locks acquire all MAX_READS
permits:
pub const ReadWaiter = struct { inner: SemaphoreWaiter, // Requests 1 permit user_waker: ?WakerFn = null, user_waker_ctx: ?*anyopaque = null,};
pub const WriteWaiter = struct { inner: SemaphoreWaiter, // Requests MAX_READS permits user_waker: ?WakerFn = null, user_waker_ctx: ?*anyopaque = null,};Writer priority emerges naturally from this design: a queued writer’s
partial acquisition drains permits toward 0, so new readers’
tryAcquire(1) fails. The FIFO queue serves the writer before any reader
queued behind it.
The ReadLockFuture and WriteLockFuture each embed their respective
waiter:
pub const ReadLockFuture = struct { pub const Output = void; rwlock: *RwLock, waiter: ReadWaiter, // Embedded state: State, stored_waker: ?FutureWaker,};
pub const WriteLockFuture = struct { pub const Output = void; rwlock: *RwLock, waiter: WriteWaiter, // Embedded state: State, stored_waker: ?FutureWaker,};Drop Safety: The State Enum
Section titled “Drop Safety: The State Enum”A subtle correctness requirement: what happens if a future is destroyed while its waiter is still in the queue? The waiter’s intrusive pointers would dangle, corrupting the list.
Each future tracks its lifecycle through a State enum:
const State = enum { /// Haven't tried to acquire yet init, /// Waiting for permits (waiter is in the queue) waiting, /// Permits acquired (waiter has been removed) acquired,};The cancel and deinit methods check the state before cleanup:
pub fn cancel(self: *Self) void { if (self.state == .waiting) { // Remove waiter from queue before destroying self.semaphore.cancelAcquire(&self.waiter); } if (self.stored_waker) |*w| { w.deinit(); self.stored_waker = null; }}
pub fn deinit(self: *Self) void { self.cancel();}This ensures:
| State | On Drop | Safe? |
|---|---|---|
init | Nothing to clean up | Yes |
waiting | Remove waiter from queue, then destroy | Yes |
acquired | Nothing to clean up (already removed) | Yes |
The cancelAcquire method on the semaphore handles the removal under lock,
returning any partially acquired permits:
pub fn cancelAcquire(self: *Self, waiter: *Waiter) void { if (waiter.isComplete()) return;
self.mutex.lock(); if (waiter.isComplete()) { self.mutex.unlock(); return; }
// Remove from list if linked self.waiters.remove(waiter); waiter.pointers.reset();
const acquired = waiter.permits_requested - waiter.state.load(.monotonic); self.mutex.unlock();
// Return partial permits (may wake other waiters) if (acquired > 0) self.release(acquired);}The Waker Callback Bridge
Section titled “The Waker Callback Bridge”When a waiter is satisfied (e.g., a mutex is unlocked), the sync primitive
needs to wake the waiting task. The bridge between the primitive’s waker
callback and the future’s stored Waker is a static function pointer:
// In AcquireFuturepub fn poll(self: *Self, ctx: *Context) PollResult(void) { switch (self.state) { .init => { // Store the scheduler's waker self.stored_waker = ctx.getWaker().clone();
// Set callback that bridges to the stored waker self.waiter.setWaker(@ptrCast(self), wakeCallback);
if (self.semaphore.acquireWait(&self.waiter)) { self.state = .acquired; return .{ .ready = {} }; } self.state = .waiting; return .pending; }, .waiting => { if (self.waiter.isComplete()) { self.state = .acquired; return .{ .ready = {} }; } return .pending; }, .acquired => return .{ .ready = {} }, }}
/// Bridge: primitive's WakerFn -> future's stored Wakerfn wakeCallback(ctx: *anyopaque) void { const self: *Self = @ptrCast(@alignCast(ctx)); if (self.stored_waker) |*w| { w.wakeByRef(); }}The flow:
Semaphore.release() -> waiter.waker(waiter.waker_ctx) // primitive's callback -> AcquireFuture.wakeCallback(self) // bridge function -> self.stored_waker.wakeByRef() // scheduler's waker -> header.schedule() // reschedule the taskNo allocation at any step. The callback pointer is a comptime-known function.
The self pointer is the future itself (which is inside the FutureTask
allocation). The stored waker is the scheduler’s task waker.
Waker Migration
Section titled “Waker Migration”Tasks can migrate between worker threads (via work stealing). When a task
resumes on a different worker, its waker might need updating. The waiting
state handles this:
.waiting => { if (self.waiter.isComplete()) { self.state = .acquired; return .{ .ready = {} }; }
// Update waker in case task migrated to different worker const new_waker = ctx.getWaker(); if (self.stored_waker) |*old| { if (!old.willWakeSame(new_waker)) { old.deinit(); self.stored_waker = new_waker.clone(); } } return .pending;},willWakeSame compares the raw data pointers. If the waker has changed
(different task header due to migration), the old one is dropped and the
new one is cloned.
Anti-Starvation: The Semaphore Algorithm
Section titled “Anti-Starvation: The Semaphore Algorithm”A naive semaphore has a starvation bug: permits can “slip through” between a failed CAS and queue insertion. Volt’s semaphore eliminates this with a “mutex-before-CAS” slow path:
pub fn acquireWait(self: *Self, waiter: *Waiter) bool { // Fast path: lock-free CAS for full acquisition var curr = self.permits.load(.acquire); while (curr >= needed) { if (self.permits.cmpxchgWeak(curr, curr - needed, ...)) |updated| { curr = updated; continue; } waiter.state.store(0, .release); return true; // Got all permits without locking }
// Slow path: lock BEFORE CAS self.mutex.lock();
// Under lock: drain available permits curr = self.permits.load(.acquire); while (acquired < needed and curr > 0) { // CAS under lock... }
if (acquired >= needed) { self.mutex.unlock(); return true; }
// Queue waiter UNDER LOCK -- no permits can arrive // between our CAS and this insertion, because // release() also takes this mutex. self.waiters.pushFront(waiter); self.mutex.unlock(); return false;}And release() always takes the mutex:
pub fn release(self: *Self, num: usize) void { self.mutex.lock();
if (self.waiters.back() == null) { // No waiters -- surplus to atomic _ = self.permits.fetchAdd(num, .release); self.mutex.unlock(); return; }
// Serve waiters directly self.releaseToWaiters(num);}The key invariant: permits never float in the atomic counter when waiters
are queued. release() serves waiters directly; only surplus goes to the
atomic. This prevents indefinite starvation.
Comparison with Tokio
Section titled “Comparison with Tokio”Tokio uses the same intrusive waiter pattern. The Rust implementation is structurally equivalent:
| Aspect | Tokio (Rust) | Volt (Zig) |
|---|---|---|
| Waiter embedding | Pin<&mut Waiter> in future | Waiter field in future struct |
| List type | Custom intrusive list | LinkedList(T, field) |
| Drop safety | Rust’s Drop trait | Manual deinit/cancel with state enum |
| Waker storage | Option<Waker> | ?FutureWaker |
| Memory ordering | Acquire/Release pairs | Acquire/Release pairs |
| Permit tracking | Atomic state word | std.atomic.Value(usize) |
The primary difference is safety enforcement. Rust’s type system guarantees
(via Pin) that the waiter is not moved while in the queue. Zig requires
the programmer to ensure this through the state enum and proper
deinit/cancel calls.
Performance Impact
Section titled “Performance Impact”The zero-allocation design has measurable performance benefits:
| Operation | With Allocation | Zero-Allocation | Improvement |
|---|---|---|---|
| Uncontended lock | ~15 ns | ~10 ns | 1.5x |
| Contended lock (10 waiters) | ~150 ns | ~100 ns | 1.5x |
| 1M lock/unlock cycles | 1M malloc + 1M free | 0 allocations | — |
The improvement comes from:
- No allocator calls — the waiter is already allocated as part of the future/task
- Better cache locality — the waiter is adjacent to the future’s other fields
- No fragmentation — no small, short-lived allocations polluting the free list
- No failure mode — contention cannot cause
OutOfMemory
For workloads with heavy sync primitive contention (e.g., connection pools, rate limiters, shared caches), this is the difference between scalable and not scalable.