Skip to content
v1.0.0-zig0.15.2

Comparing with Rayon

Terminal window
cd benchmarks/rayon
cargo build --release

The build system handles this automatically:

Terminal window
# Build and run the Blitz benchmark
zig build bench
# Run the comparative benchmark (Blitz vs Rayon side-by-side)
zig build compare

The bench step compiles benchmarks/rayon_compare.zig with ReleaseFast optimization and links the blitz module. The compare step builds benchmarks/compare.zig which orchestrates running both benchmarks.

Terminal window
# Runs both Blitz and Rayon benchmarks and displays results
zig build compare
Terminal window
# Rayon
cd benchmarks/rayon
./target/release/rayon_bench
# Blitz
zig build bench

Measures the cost of creating and joining parallel tasks:

// Rayon
fn fork_join_bench(depth: u32) -> u64 {
if depth == 0 { return 1; }
let (a, b) = rayon::join(
|| fork_join_bench(depth - 1),
|| fork_join_bench(depth - 1),
);
a + b
}
// Blitz
fn forkJoinBench(n: u64) u64 {
if (n < 20) return fibSequential(n);
const r = blitz.join(.{
.a = .{ forkJoinBench, n - 1 },
.b = .{ forkJoinBench, n - 2 },
});
return r.a + r.b;
}

Expected results:

  • Both should achieve ~1 ns/fork at scale
  • Blitz may be slightly faster due to lock-free wake
// Rayon
let sum: i64 = data.par_iter().sum();
// Blitz
const sum = blitz.iter(i64, data).sum();

Expected results:

  • Near-identical for large data (memory-bound)
  • Blitz may be faster for medium data (lower overhead)
// Rayon
data.par_sort();
// Blitz
blitz.sortAsc(i64, data);

Expected results:

  • Both use parallel quicksort variants
  • Performance depends on data patterns
// Rayon
let result: Vec<_> = data.par_iter()
.filter(|x| **x > 0)
.map(|x| x * 2)
.collect();
// Blitz - composable iterators
const result = blitz.iter(i64, data)
.filter(isPositive)
.map(double);
OperationBlitz vs Rayon
Fork-join overheadBlitz 10-20% faster
Parallel sum (large)Within 5%
Parallel sortWithin 10%
Iterator chainsRayon may be faster*

*Rayon’s lazy iterators can fuse operations

  1. Lock-free wake: Lower latency for task spawning
  2. Simple operations: Less abstraction overhead
  3. Comptime specialization: Zero-cost generics
  1. Iterator fusion: Chains of operations can be optimized
  2. Adaptive stealing: More sophisticated heuristics
  3. Mature optimizations: Years of tuning

Run both on identical machine with same conditions.

// Rayon
rayon::ThreadPoolBuilder::new()
.num_threads(10)
.build_global()
.unwrap();
// Blitz
try blitz.initWithConfig(.{ .background_worker_count = 9 });

Generate identical test data for both:

// Rayon
let data: Vec<i64> = (0..n).map(|i| i as i64).collect();
// Blitz
for (data, 0..) |*v, i| v.* = @intCast(i);
Terminal window
# Rayon
cargo build --release # -O3
# Blitz
zig build bench # Uses ReleaseFast (similar to -O3)
// Rayon warmup
let _ = data.par_iter().sum::<i64>();
// Blitz warmup
_ = blitz.iter(i64, data).sum();
Hardware: Apple M1 Pro, 10 cores
Data: 10M i64 elements
+--------------------+-----------+-----------+-------------+
| Benchmark | Blitz | Rayon | Comparison |
+--------------------+-----------+-----------+-------------+
| Fork-join (2M) | 0.54 ns | 0.66 ns | Blitz +22% |
| Parallel sum | 1.1 ms | 1.2 ms | Blitz +9% |
| Parallel sort | 134 ms | 145 ms | Blitz +8% |
| Parallel fib(45) | 411 ms | 414 ms | Equal |
| Find first | 3.3 ms | 3.1 ms | Rayon +6% |
+--------------------+-----------+-----------+-------------+

When sharing benchmark results:

  1. Include hardware specs (CPU, cores, memory)
  2. Include software versions (Zig version, Rust version)
  3. Show multiple runs (min, median, max)
  4. Describe data patterns (random, sorted, etc.)
  5. Note thread count used for comparison