Basic Concepts
Understanding Blitz’s execution model and core concepts.
Work Stealing
Section titled “Work Stealing”Blitz uses a work-stealing scheduler, the same approach used by Rayon, Intel TBB, and other high-performance parallel runtimes.
┌─────────────┐ ┌─────────────┐ ┌─────────────┐│ Worker 0 │ │ Worker 1 │ │ Worker 2 ││ ┌─────────┐ │ │ ┌─────────┐ │ │ ┌─────────┐ ││ │ Deque │ │ │ │ │ │ │ │ │ ││ │ ┌─────┐ │ │ │ │ │ │ │ │ │ ││ │ │Job A│◄┼─┼─────┼─┼─STEAL───┼─┼─────┼─┼─STEAL │ ││ │ ├─────┤ │ │ │ │ │ │ │ │ │ ││ │ │Job B│ │ │ │ │ │ │ │ │ │ ││ │ └──▲──┘ │ │ │ └─────────┘ │ │ └─────────┘ ││ └────┼────┘ │ │ │ │ ││ push/pop │ │ │ │ │└─────────────┘ └─────────────┘ └─────────────┘How It Works
Section titled “How It Works”- Each worker has a deque (double-ended queue)
- Workers push/pop from the bottom (LIFO - keeps cache hot)
- Idle workers steal from the top (FIFO - takes oldest work)
- No central queue - work is distributed automatically
Why Work Stealing?
Section titled “Why Work Stealing?”| Approach | Pros | Cons |
|---|---|---|
| Central queue | Simple | Contention bottleneck |
| Static partitioning | No overhead | Load imbalance |
| Work stealing | Dynamic balance, low contention | Slightly complex |
Fork-Join Model
Section titled “Fork-Join Model”Blitz follows the fork-join execution model:
fork(B) join() │ │ ▼ ▼ ┌──────────┐ ┌──────────┐ │ Task B │ │ Wait for │ │ (stolen) │ │ B result │ └──────────┘ └──────────┘ │ │ └────────────────────┘ │ Both complete- Fork: Create a subtask that may run in parallel
- Execute: Do local work while subtask runs
- Join: Wait for subtask and combine results
// Fork-join exampleconst result = blitz.join(.{ .left = .{ computeLeft, leftData }, .right = .{ computeRight, rightData },});const total = result.left + result.right;Grain Size
Section titled “Grain Size”Grain size controls the minimum chunk size before parallelization:
// Default: automatic grain sizeblitz.parallelFor(n, ctx_type, ctx, bodyFn);
// Custom grain size (1000 elements per chunk)blitz.parallelForWithGrain(n, ctx_type, ctx, bodyFn, 1000);Choosing Grain Size
Section titled “Choosing Grain Size”| Grain Size | Effect |
|---|---|
| Too small | Overhead dominates, slower than sequential |
| Too large | Poor load balancing, some cores idle |
| Just right | Amortizes overhead, good balance |
Rule of thumb: Start with defaults. Only tune if profiling shows issues.
Sequential Threshold
Section titled “Sequential Threshold”Blitz automatically avoids parallelization for small data:
// Simple size check against default grain size (65536)if (data.len >= blitz.DEFAULT_GRAIN_SIZE) { // Parallel path} else { // Sequential path (less overhead)}The threshold depends on:
- Operation type: Memory-bound ops need more data
- Worker count: More workers = higher threshold
- Data size: Must amortize fork/join overhead
Context Pattern
Section titled “Context Pattern”Blitz uses a context struct to pass data to parallel bodies:
// Define what data the parallel body needsconst Context = struct { input: []const f64, output: []f64, scale: f64,};
// Create context instanceconst ctx = Context{ .input = input_data, .output = output_data, .scale = 2.5,};
// Pass to parallel operationblitz.parallelFor(input_data.len, Context, ctx, struct { fn body(c: Context, start: usize, end: usize) void { for (c.input[start..end], c.output[start..end]) |in, *out| { out.* = in * c.scale; } }}.body);Why Context?
Section titled “Why Context?”- No closures in Zig - Can’t capture variables
- Explicit data flow - Clear what’s shared
- Comptime optimization - Struct access is fast
Thread Pool Lifecycle
Section titled “Thread Pool Lifecycle”// 1. Initialize (spawns worker threads)try blitz.init();
// 2. Use parallel operations (any number of times)blitz.parallelFor(...);blitz.parallelReduce(...);const sum = blitz.iter(data).sum();
// 3. Cleanup (joins worker threads)blitz.deinit();Important:
init()returnserror.AlreadyInitializedif called twice — always pair withdeinit()- Always pair with
deinit()usingdefer - Worker threads are reused across all operations