Skip to content
v1.0.0-zig0.15.2

Introduction

Blitz is a high-performance, lock-free work-stealing parallel runtime for Zig, inspired by Rust’s Rayon library. It provides fork-join parallelism, parallel iterators, and efficient parallel sorting.

  • Zero-overhead abstractions - Pay only for what you use
  • Automatic load balancing - Work-stealing ensures all cores stay busy
  • Composable iterators - Chain operations like Rayon’s parallel iterators
  • No external dependencies - Pure Zig, works anywhere Zig works
const blitz = @import("blitz");
pub fn main() !void {
var data = [_]i64{ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
// Parallel sum - automatically parallelized
const sum = blitz.iter(i64, &data).sum();
// Parallel search with early exit
const found = blitz.iter(i64, &data).findAny(struct {
fn pred(x: i64) bool { return x > 5; }
}.pred);
// Parallel transform in-place
blitz.iterMut(i64, &data).mapInPlace(struct {
fn double(x: i64) i64 { return x * 2; }
}.double);
// Fork-join for divide-and-conquer
const result = blitz.join(.{
.left = .{ computeLeft, left_data },
.right = .{ computeRight, right_data },
});
}

Blitz achieves significant speedups over Rust’s Rayon on equivalent benchmarks:

OperationBlitzRayonSpeedup
join() fork-join (depth 20)0.54 ms0.71 ms1.31x
iter().sum() (100M i64)3.1 ms8.2 ms2.6x
sortAsc() (10M i64)89 ms119 ms1.34x

Benchmarks on Apple M2 Pro, 10 cores

Parallel Iterators

Rayon-style composable iterators: sum(), min(), max(), findAny(), any(), all(), reduce(), and more. Automatic parallelization with early-exit support.

Fork-Join

Efficient divide-and-conquer with join(). Supports heterogeneous return types and up to 8 parallel tasks. Perfect for recursive algorithms.

Work Stealing

Lock-free Chase-Lev deque with Rayon’s sleep/wake protocol. Optimal load balancing with minimal contention and smart thread sleeping.

Parallel Sorting

Pattern-defeating quicksort (PDQSort) with automatic parallelization. 10x faster than std.mem.sort on large arrays.

The iterator API provides the most ergonomic way to parallelize data processing:

const data: []const i64 = &.{ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
// Aggregations
const sum = blitz.iter(i64, data).sum(); // 55
const min = blitz.iter(i64, data).min(); // ?i64 = 1
const max = blitz.iter(i64, data).max(); // ?i64 = 10
// Search with early exit
const found = blitz.iter(i64, data).findAny(isNegative); // Fast, any match
const first = blitz.iter(i64, data).findFirst(isNegative); // Deterministic
// Predicates (short-circuit)
const hasNeg = blitz.iter(i64, data).any(isNegative); // Stops on first match
const allPos = blitz.iter(i64, data).all(isPositive); // Stops on first fail
// Mutation
blitz.iterMut(i64, &data).mapInPlace(double); // Transform in-place
blitz.iterMut(i64, &data).fill(0); // Parallel memset
// Custom reduction
const product = blitz.iter(i64, data).reduce(1, multiply);

For divide-and-conquer algorithms and independent parallel tasks:

// Two parallel tasks with different return types
const result = blitz.join(.{
.count = .{ countItems, items }, // Returns usize
.total = .{ sumValues, values }, // Returns i64
});
// Access: result.count, result.total
// Recursive parallel fibonacci
fn parallelFib(n: u64) u64 {
if (n < 20) return fibSequential(n); // Sequential threshold
const r = blitz.join(.{
.a = .{ parallelFib, n - 1 },
.b = .{ parallelFib, n - 2 },
});
return r.a + r.b;
}

High-performance parallel PDQSort:

var numbers = [_]i64{ 5, 2, 8, 1, 9, 3, 7, 4, 6 };
blitz.sortAsc(i64, &numbers); // Ascending
blitz.sortDesc(i64, &numbers); // Descending
blitz.sort(i64, &numbers, lessThanFn); // Custom comparator
// Sort structs by key
blitz.sortByKey(Person, u32, &people, struct {
fn key(p: Person) u32 { return p.age; }
}.key);

For fine-grained control over parallelism:

// Parallel for with context
blitz.parallelFor(n, Context, ctx, bodyFn);
blitz.parallelForWithGrain(n, Context, ctx, bodyFn, grain_size);
// Parallel map-reduce
const result = blitz.parallelReduce(T, n, identity, Context, ctx, mapFn, combineFn);
┌─────────────────────────────────────────────────────────────┐
│ User Code │
├─────────────────────────────────────────────────────────────┤
│ iter().sum() │ join(.{...}) │ sortAsc() │ parallelFor │
├─────────────────────────────────────────────────────────────┤
│ Work-Stealing Runtime │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │Worker 0 │ │Worker 1 │ │Worker 2 │ │Worker N │ │
│ │┌───────┐│ │┌───────┐│ │┌───────┐│ │┌───────┐│ │
│ ││ Deque ││ ││ Deque ││ ││ Deque ││ ││ Deque ││ │
│ │└───────┘│ │└───────┘│ │└───────┘│ │└───────┘│ │
│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │ │
│ └────────────┴─────┬──────┴────────────┘ │
│ │ │
│ ┌───────────┴───────────┐ │
│ │ Sleep/Wake Manager │ │
│ │ (JEC Protocol) │ │
│ └───────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Use CaseRecommendation
Data processing (sum, filter, transform)blitz.iter() / blitz.iterMut()
Recursive divide-and-conquerblitz.join()
Sorting large arraysblitz.sortAsc() / blitz.sort()
Fine-grained parallel loopsblitz.parallelFor()
Map-reduce patternsblitz.parallelReduce()
  • Small data (<1000 elements) - Overhead exceeds benefit
  • I/O-bound workloads - Blitz is optimized for CPU-bound work
  • Shared mutable state - Use atomic operations or avoid parallelism
  • Zig 0.15.0 or later
  • POSIX (Linux, macOS) or Windows
  • No external dependencies