Hierarchical Timer Wheel
Timer management is critical for an async runtime. Every sleep(), timeout(), and interval() call registers a timer that must fire at the right time. Volt uses a hierarchical timing wheel — a data structure that provides O(1) timer insertion and O(1) expiration checking through bit-manipulation tricks.
The Problem with Naive Approaches
Section titled “The Problem with Naive Approaches”A priority queue (heap) gives O(log N) insertion and O(log N) extraction. For a runtime managing millions of timers, the logarithmic factor adds up. Worse, heap operations have poor cache behavior due to pointer chasing.
A sorted list gives O(N) insertion. A skip list gives O(log N) probabilistically. Both are unsuitable for the hot path of an async runtime.
The timing wheel trades space for time: by quantizing time into discrete “slots” and using multiple levels for different time scales, it achieves constant-time operations.
Wheel Architecture
Section titled “Wheel Architecture”Volt’s timer wheel has 5 levels, each with 64 slots. Each level covers a larger time range with coarser granularity:
Level 0: 64 slots x 1ms = 64ms rangeLevel 1: 64 slots x 64ms = 4.1s rangeLevel 2: 64 slots x 4.1s = 4.3min rangeLevel 3: 64 slots x 4.3m = 4.6hr rangeLevel 4: 64 slots x 4.6h = 12.4day rangeOverflow: linked list for timers beyond ~12 daysThe slot durations are precomputed at compile time:
const SLOT_DURATIONS: [NUM_LEVELS]u64 = blk: { var durations: [NUM_LEVELS]u64 = undefined; var duration: u64 = LEVEL_0_SLOT_NS; // 1ms for (0..NUM_LEVELS) |i| { durations[i] = duration; duration *= SLOTS_PER_LEVEL; // x64 per level } break :blk durations;};Each level has:
- 64 slots: Each slot is a doubly-linked list of timer entries.
- A current index: Points to the slot corresponding to “now” at this level’s granularity.
- A 64-bit occupancy bitfield: Bit
iis set if sloticontains at least one timer.
Level 0 (1ms resolution): occupied = 0b...0010_0000_0101 current | v+---+---+---+---+---+---+---+ ... +---+| * | | * | | | * | | | |+---+---+---+---+---+---+---+ ... +---+ 0 1 2 3 4 5 6 63
* = slot contains one or more timer entriesO(1) Timer Insertion
Section titled “O(1) Timer Insertion”When inserting a timer with a given deadline, the wheel must determine which level and slot to use. The naive approach scans levels linearly (O(NUM_LEVELS)). Volt uses bit manipulation for O(1) level selection.
Level Selection via @clz
Section titled “Level Selection via @clz”The key insight: if the current time and the deadline differ in their most significant bits, the timer belongs at a higher level. The XOR of the two times gives the differing bits, and @clz (count leading zeros) finds the most significant difference.
inline fn levelFor(now_ns: u64, deadline_ns: u64) usize { const elapsed_ticks = now_ns / LEVEL_0_SLOT_NS; const when_ticks = deadline_ns / LEVEL_0_SLOT_NS;
// XOR reveals which bits differ between now and deadline // OR with SLOT_MASK ensures we consider at least level 0 const masked = (elapsed_ticks ^ when_ticks) | SLOT_MASK;
if (masked >= MAX_WHEEL_DURATION / LEVEL_0_SLOT_NS) { return NUM_LEVELS; // Overflow }
const leading_zeros = @clz(masked); const significant_bit = 63 - leading_zeros;
// Each level handles 6 bits (log2(64) = 6) return significant_bit / BITS_PER_LEVEL;}Example: If now = 0ms and deadline = 150ms:
elapsed_ticks = 0,when_ticks = 150XOR = 150 = 0b1001_0110,|SLOT_MASK = 0b1011_1111@clz(0b1011_1111) = 56(for u64),significant_bit = 7level = 7 / 6 = 1— Level 1 (64ms resolution), correct since 150ms is within Level 1’s 4.1s range.
Slot Placement
Section titled “Slot Placement”Once the level is determined, the slot within that level is computed as a simple division and modulo:
pub fn slotFor(deadline_ns: u64, level_idx: usize, now_ns: u64) usize { const slot_duration = slotDuration(level_idx); const delta = deadline_ns -| now_ns; const slot_offset = delta / slot_duration; return @intCast(slot_offset % SLOTS_PER_LEVEL);}Insertion then adds the entry to the slot’s linked list and sets the occupancy bit:
pub fn addEntry(self: *Level, slot_idx: usize, entry: *TimerEntry) void { self.slots[slot_idx].push(entry); self.occupied |= (@as(u64, 1) << @intCast(slot_idx));}O(1) Next-Expiry Lookup
Section titled “O(1) Next-Expiry Lookup”Finding the next timer to fire uses @ctz (count trailing zeros) on the occupancy bitfield, starting from the current slot position:
pub fn nextOccupiedSlot(self: *const Level, from_slot: usize) ?usize { if (self.occupied == 0) return null;
// Mask for slots >= from_slot const from_mask: u64 = ~((@as(u64, 1) << @intCast(from_slot)) - 1); const masked = self.occupied & from_mask;
if (masked != 0) { return @ctz(masked); // First occupied slot at or after current }
// Wrap around: check slots before from_slot const wrap_mask: u64 = (@as(u64, 1) << @intCast(from_slot)) - 1; const wrapped = self.occupied & wrap_mask; if (wrapped != 0) { return @ctz(wrapped); }
return null;}This is a single bitwise AND plus a @ctz instruction — O(1) regardless of how many timers are registered.
Cascading
Section titled “Cascading”Higher-level timers are placed with coarse granularity. As time advances, they must be moved to lower levels for precise firing. This is called cascading.
When level 0’s current slot wraps around to 0 (every 64ms), timers from level 1’s current slot are cascaded down. If level 1 also wraps, level 2 cascades, and so on.
Before cascade (level 0 wraps):
Level 1, current slot 3: +---+---+---+---+---+ | | | |T,U| | ... +---+---+---+---+---+
After cascade: Timers T and U are re-inserted at level 0 with precise slot placement. They may end up in different level 0 slots depending on their exact deadlines.fn cascade(self: *Self, level_idx: usize) void { if (level_idx >= NUM_LEVELS) return;
const level = &self.levels[level_idx]; const slot_idx = level.current; level.current = (slot_idx + 1) % SLOTS_PER_LEVEL;
// Recursive cascade if this level also wraps if (level.current == 0 and level_idx + 1 < NUM_LEVELS) { self.cascade(level_idx + 1); }
// Move all entries from this slot to lower levels var entry = level.takeAllFromSlot(slot_idx); while (entry) |e| { const next = e.next; e.next = null; e.prev = null; if (e.cancelled) { self.count -|= 1; } else { self.insertInternal(e); // Re-insert at lower level } entry = next; }}Cascading is triggered during poll():
pub fn poll(self: *Self) ?*TimerEntry { self.updateTime();
const level0_slot = (self.now_ns / LEVEL_0_SLOT_NS) % SLOTS_PER_LEVEL; var current = self.levels[0].current;
while (current != level0_slot) { // Collect all expired entries from this slot const entries = self.levels[0].takeAllFromSlot(current); self.appendExpired(&expired_head, &expired_tail, entries);
current = (current + 1) % SLOTS_PER_LEVEL;
// Cascade from higher levels when slot 0 wraps if (current == 0) { self.cascade(1); } } // ...}Timer Entries (Intrusive, Zero-Allocation)
Section titled “Timer Entries (Intrusive, Zero-Allocation)”Timer entries are intrusive — they contain their own linked-list pointers. This means inserting a timer requires no heap allocation if the caller provides the entry on the stack or embeds it in their struct:
pub const TimerEntry = struct { deadline_ns: u64, task: ?*Header, // Task-based waking waker: ?WakerFn, // Generic waker callback waker_ctx: ?*anyopaque, user_data: u64, next: ?*TimerEntry, // Intrusive doubly-linked list prev: ?*TimerEntry, cancelled: bool, heap_allocated: bool, // If true, wheel frees on expiry};The driver provides both allocation modes:
// Heap-allocated (driver manages lifetime)var handle = try driver.registerSleep(Duration.fromMillis(100), ctx, wake_fn);
// Stack-allocated (caller manages lifetime)var entry: TimerEntry = undefined;driver.registerSleepEntry(&entry, Duration.fromMillis(100), ctx, wake_fn);Cancellation sets a flag rather than removing the entry from the list. Cancelled entries are skipped during poll and cleaned up during cascading:
pub fn remove(_: *Self, entry: *TimerEntry) void { entry.cancel(); // Lazy removal}Integration with the Scheduler
Section titled “Integration with the Scheduler”Worker 0 is the primary timer driver. It polls the timer wheel at the start of each tick. Other workers do not poll timers, avoiding contention on the timer mutex:
// In Worker.run():if (self.index == 0) { _ = self.scheduler.pollTimers();}When Worker 0 parks, it uses the next timer expiration as its park timeout, ensuring timers fire with bounded latency even when no I/O events are pending:
const park_timeout: u64 = if (self.index == 0) self.scheduler.nextTimerExpiration() orelse PARK_TIMEOUT_NSelse PARK_TIMEOUT_NS;Comparison with Other Timer Approaches
Section titled “Comparison with Other Timer Approaches”| Approach | Insert | Fire | Cancel | Memory |
|---|---|---|---|---|
| Hierarchical wheel (Volt) | O(1) | O(1) amortized | O(1) lazy | Fixed 5 * 64 slots |
| Binary heap | O(log N) | O(log N) | O(N) or O(log N) with index | Dynamic |
| Red-black tree | O(log N) | O(log N) | O(log N) | Per-node allocation |
| Skip list | O(log N) expected | O(1) | O(log N) expected | Per-node allocation |
| Hashed wheel (Kafka) | O(1) | O(1) | O(1) | Fixed but single-level |
The hierarchical wheel’s main advantage is constant-time operations with bounded memory. The cascading cost is amortized: a level-1 cascade happens every 64ms, a level-2 cascade every 4 seconds, and so on. In practice, most timers are short-lived (milliseconds to seconds) and never reach higher levels.
The main limitation is resolution: level 0 has 1ms granularity. Timers cannot fire with sub-millisecond precision. For the use cases of an async I/O runtime (network timeouts, sleep, intervals), 1ms resolution is more than sufficient.
Source Files
Section titled “Source Files”- Timer wheel:
src/internal/scheduler/TimerWheel.zig - Timer driver:
src/time/driver.zig
References
Section titled “References”- George Varghese and Tony Lauck, “Hashed and Hierarchical Timing Wheels: Data Structures for the Efficient Implementation of a Timer Facility,” IEEE/ACM Transactions on Networking, 5(6):824–834, 1997. doi:10.1109/90.650142
- Tokio timer implementation:
tokio/src/runtime/time - Kafka’s hierarchical timing wheel:
kafka/server/src/main/java/kafka/utils/timer