Comparing with Rayon

Setup

Build Rayon Benchmark

cd benchmarks/rayon
cargo build --release

Build Blitz Benchmark

The build system handles this automatically:

# Build and run the Blitz benchmark
zig build bench

# Run the comparative benchmark (Blitz vs Rayon side-by-side)
zig build compare

The bench step compiles benchmarks/rayon_compare.zig with ReleaseFast optimization and links the blitz module. The compare step builds benchmarks/compare.zig which orchestrates running both benchmarks.

Running Comparison

Automated

# Runs both Blitz and Rayon benchmarks and displays results
zig build compare

Manual

# Rayon
cd benchmarks/rayon
./target/release/rayon_bench

# Blitz
zig build bench

Benchmark Categories

1. Fork-Join Overhead

Measures the cost of creating and joining parallel tasks:

// Rayon
fn fork_join_bench(depth: u32) -> u64 {
    if depth == 0 { return 1; }
    let (a, b) = rayon::join(
        || fork_join_bench(depth - 1),
        || fork_join_bench(depth - 1),
    );
    a + b
}

// Blitz
fn forkJoinBench(n: u64) u64 {
    if (n < 20) return fibSequential(n);
    const r = blitz.join(.{
        .a = .{ forkJoinBench, n - 1 },
        .b = .{ forkJoinBench, n - 2 },
    });
    return r.a + r.b;
}

Expected results:

Both should achieve ~1 ns/fork at scale
Blitz may be slightly faster due to lock-free wake

2. Parallel Sum

// Rayon
let sum: i64 = data.par_iter().sum();

// Blitz
const sum = blitz.iter(i64, data).sum();

Expected results:

Near-identical for large data (memory-bound)
Blitz may be faster for medium data (lower overhead)

3. Parallel Sort

// Rayon
data.par_sort();

// Blitz
blitz.sortAsc(i64, data);

Expected results:

Both use parallel quicksort variants
Performance depends on data patterns

4. Parallel Iterators

// Rayon
let result: Vec<_> = data.par_iter()
    .filter(|x| **x > 0)
    .map(|x| x * 2)
    .collect();

// Blitz - composable iterators
const result = blitz.iter(i64, data)
    .filter(isPositive)
    .map(double);

Results Interpretation

Expected Performance Ranges

Operation	Blitz vs Rayon
Fork-join overhead	Blitz 10-20% faster
Parallel sum (large)	Within 5%
Parallel sort	Within 10%
Iterator chains	Rayon may be faster*

*Rayon’s lazy iterators can fuse operations

When Blitz Wins

Lock-free wake: Lower latency for task spawning
Simple operations: Less abstraction overhead
Comptime specialization: Zero-cost generics

When Rayon Wins

Iterator fusion: Chains of operations can be optimized
Adaptive stealing: More sophisticated heuristics
Mature optimizations: Years of tuning

Fair Comparison Guidelines

1. Same Hardware

Run both on identical machine with same conditions.

2. Same Thread Count

// Rayon
rayon::ThreadPoolBuilder::new()
    .num_threads(10)
    .build_global()
    .unwrap();

// Blitz
try blitz.initWithConfig(.{ .background_worker_count = 9 });

3. Same Data

Generate identical test data for both:

// Rayon
let data: Vec<i64> = (0..n).map(|i| i as i64).collect();

// Blitz
for (data, 0..) |*v, i| v.* = @intCast(i);

4. Same Optimization Level

# Rayon
cargo build --release  # -O3

# Blitz
zig build bench  # Uses ReleaseFast (similar to -O3)

5. Warmup Both

// Rayon warmup
let _ = data.par_iter().sum::<i64>();

// Blitz warmup
_ = blitz.iter(i64, data).sum();

Sample Results

Hardware: Apple M1 Pro, 10 cores
Data: 10M i64 elements

+--------------------+-----------+-----------+-------------+
| Benchmark          | Blitz     | Rayon     | Comparison  |
+--------------------+-----------+-----------+-------------+
| Fork-join (2M)     | 0.54 ns   | 0.66 ns   | Blitz +22%  |
| Parallel sum       | 1.1 ms    | 1.2 ms    | Blitz +9%   |
| Parallel sort      | 134 ms    | 145 ms    | Blitz +8%   |
| Parallel fib(45)   | 411 ms    | 414 ms    | Equal       |
| Find first         | 3.3 ms    | 3.1 ms    | Rayon +6%   |
+--------------------+-----------+-----------+-------------+

Reporting Results

When sharing benchmark results:

Include hardware specs (CPU, cores, memory)
Include software versions (Zig version, Rust version)
Show multiple runs (min, median, max)
Describe data patterns (random, sorted, etc.)
Note thread count used for comparison