Chase-Lev Deque
Overview
Section titled “Overview”The Chase-Lev deque is a concurrent data structure designed specifically for work stealing. It was introduced in “Dynamic Circular Work-Stealing Deque” by David Chase and Yossi Lev (2005).
Properties
Section titled “Properties”| Property | Value |
|---|---|
| Push (owner) | Wait-free O(1) |
| Pop (owner) | Wait-free O(1) |
| Steal (thieves) | Lock-free O(1) |
| Memory | O(n) where n is max concurrent items |
Design
Section titled “Design”Structure
Section titled “Structure” ┌─────────────────────────────────────┐ │ Circular Buffer │ │ ┌───┬───┬───┬───┬───┬───┬───┬───┐ │ │ │ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ │ │ └───┴───┴───┴───┴───┴───┴───┴───┘ │ │ ▲ ▲ │ │ top bottom │ │ (thieves) (owner) │ └─────────────────────────────────────┘
Elements stored: indices [top, bottom)Owner operates at bottomThieves operate at topKey Insight: Asymmetric Access
Section titled “Key Insight: Asymmetric Access”Owner (single thread):- Push: Write to bottom, increment bottom- Pop: Decrement bottom, read from bottom
Thieves (multiple threads):- Steal: Read from top, CAS increment top
The asymmetry allows owner operations to be wait-free!Implementation
Section titled “Implementation”Data Structure
Section titled “Data Structure”pub fn Deque(comptime T: type) type { return struct { const Self = @This();
/// Circular buffer (power of 2 size for fast modulo) buffer: []T, mask: usize, // buffer.len - 1
/// Atomic indices bottom: std.atomic.Value(isize), // Owner's end top: std.atomic.Value(isize), // Thieves' end
pub fn init(allocator: Allocator, capacity: usize) !Self { // Round up to power of 2 const size = std.math.ceilPowerOfTwo(usize, capacity); return Self{ .buffer = try allocator.alloc(T, size), .mask = size - 1, .bottom = std.atomic.Value(isize).init(0), .top = std.atomic.Value(isize).init(0), }; } };}Push (Owner Only)
Section titled “Push (Owner Only)”Wait-free: always succeeds in constant time.
pub fn push(self: *Self, item: T) void { const b = self.bottom.load(.monotonic);
// Write item to buffer self.buffer[@intCast(b) & self.mask] = item;
// Memory fence: ensure item is visible before bottom update @fence(.release);
// Make item available to stealers self.bottom.store(b + 1, .monotonic);}Why it’s wait-free:
- No loops
- No CAS operations
- Just a write and atomic store
Pop (Owner Only)
Section titled “Pop (Owner Only)”Wait-free in common case, one CAS in edge case.
pub fn pop(self: *Self) ?T { // Decrement bottom first (tentatively claim item) var b = self.bottom.load(.monotonic) - 1; self.bottom.store(b, .seq_cst);
// Load top (may race with stealers) var t = self.top.load(.seq_cst);
if (t <= b) { // Deque is non-empty const item = self.buffer[@intCast(b) & self.mask];
if (t == b) { // This is the LAST item - race with stealers if (self.top.cmpxchgStrong(t, t + 1, .seq_cst, .relaxed) != null) { // A stealer got it first self.bottom.store(b + 1, .monotonic); return null; } self.bottom.store(b + 1, .monotonic); } return item; } else { // Deque was empty self.bottom.store(b + 1, .monotonic); return null; }}The tricky case:
Before: top=5, bottom=6 (one item at index 5)
Owner pop: bottom=5Stealer: Reads top=5, bottom=5 (sees one item)
Race! Both want item at index 5.Solution: CAS on top to break tie.Steal (Thieves)
Section titled “Steal (Thieves)”Lock-free: may fail but makes progress system-wide.
Returns a struct with result enum (success/empty/retry) and optional item:
pub const StealResult = enum { empty, success, retry };
pub fn steal(self: *Self) struct { result: StealResult, item: ?T } { // Load top first with acquire const t = self.top.load(.acquire);
// Load bottom with acquire to see owner's operations const b = self.bottom.load(.acquire);
if (t >= b) { // Deque is empty return .{ .result = .empty, .item = null }; }
// Speculatively read the item const idx = @as(usize, @intCast(t)) & self.mask; const item = self.buffer[idx];
// Try to claim by incrementing top with CAS if (self.top.cmpxchgWeak(t, t + 1, .seq_cst, .monotonic)) |_| { // Lost the race - another thief got it or owner popped return .{ .result = .retry, .item = null }; }
// Success! return .{ .result = .success, .item = item };}Why CAS?
Multiple thieves trying to steal:
Thief A: Read top=3, item=XThief B: Read top=3, item=X
Without CAS, both would take item X!
With CAS:Thief A: CAS(3→4) succeeds, gets XThief B: CAS(3→4) fails (top is now 4), retriesMemory Ordering
Section titled “Memory Ordering”Why seq_cst on Critical Operations?
Section titled “Why seq_cst on Critical Operations?”// Popself.bottom.store(b, .seq_cst); // Must order before top readvar t = self.top.load(.seq_cst);
// Stealvar t = self.top.load(.acquire);@fence(.seq_cst); // Must order before bottom readconst b = self.bottom.load(.acquire);The seq_cst ensures:
- Pop’s bottom decrement is visible before checking top
- Steal’s top read is ordered with bottom read
Without this, we could have:
- Pop thinks deque has 2 items (stale top)
- Steal thinks deque has 2 items (stale bottom)
- Both try to take “different” items that are actually the same!
Comparison with Alternatives
Section titled “Comparison with Alternatives”| Deque Type | Push | Pop | Steal | Notes |
|---|---|---|---|---|
| Chase-Lev | Wait-free | Wait-free* | Lock-free | Best for work stealing |
| Mutex-based | O(1) + lock | O(1) + lock | O(1) + lock | Simple but slow |
| Michael-Scott | Lock-free | Lock-free | Lock-free | For queues, not deques |
*Wait-free except for single-item case requiring CAS
Optimization: No ABA Problem
Section titled “Optimization: No ABA Problem”Unlike many lock-free structures, Chase-Lev doesn’t suffer from ABA:
ABA Problem (in other structures):1. Thread A reads value A at location X2. Thread B changes X: A → B → A3. Thread A's CAS succeeds (sees A) but data is wrong!
Chase-Lev avoids this:- Indices only increase monotonically- No value reuse in the same position- Circular buffer means index wraps, not valuePerformance
Section titled “Performance”Benchmark (single producer, 4 stealers):
Push: ~3 ns (wait-free)Pop: ~3 ns (wait-free common case)Steal: ~10-50 ns (CAS contention varies)
Throughput: ~100M ops/sec per dequeReferences
Section titled “References”- Chase, D., & Lev, Y. (2005). “Dynamic Circular Work-Stealing Deque”
- Le, N. M., et al. (2013). “Correct and Efficient Work-Stealing for Weak Memory Models”
- Rayon source:
rayon-core/src/deque