Comparing with Rayon
Build Rayon Benchmark
Section titled “Build Rayon Benchmark”cd benchmarks/rayoncargo build --releaseBuild Blitz Benchmark
Section titled “Build Blitz Benchmark”The build system handles this automatically:
# Build and run the Blitz benchmarkzig build bench
# Run the comparative benchmark (Blitz vs Rayon side-by-side)zig build compareThe bench step compiles benchmarks/rayon_compare.zig with ReleaseFast optimization and links the blitz module. The compare step builds benchmarks/compare.zig which orchestrates running both benchmarks.
Running Comparison
Section titled “Running Comparison”Automated
Section titled “Automated”# Runs both Blitz and Rayon benchmarks and displays resultszig build compareManual
Section titled “Manual”# Rayoncd benchmarks/rayon./target/release/rayon_bench
# Blitzzig build benchBenchmark Categories
Section titled “Benchmark Categories”1. Fork-Join Overhead
Section titled “1. Fork-Join Overhead”Measures the cost of creating and joining parallel tasks:
// Rayonfn fork_join_bench(depth: u32) -> u64 { if depth == 0 { return 1; } let (a, b) = rayon::join( || fork_join_bench(depth - 1), || fork_join_bench(depth - 1), ); a + b}// Blitzfn forkJoinBench(n: u64) u64 { if (n < 20) return fibSequential(n); const r = blitz.join(.{ .a = .{ forkJoinBench, n - 1 }, .b = .{ forkJoinBench, n - 2 }, }); return r.a + r.b;}Expected results:
- Both should achieve ~1 ns/fork at scale
- Blitz may be slightly faster due to lock-free wake
2. Parallel Sum
Section titled “2. Parallel Sum”// Rayonlet sum: i64 = data.par_iter().sum();// Blitzconst sum = blitz.iter(i64, data).sum();Expected results:
- Near-identical for large data (memory-bound)
- Blitz may be faster for medium data (lower overhead)
3. Parallel Sort
Section titled “3. Parallel Sort”// Rayondata.par_sort();// Blitzblitz.sortAsc(i64, data);Expected results:
- Both use parallel quicksort variants
- Performance depends on data patterns
4. Parallel Iterators
Section titled “4. Parallel Iterators”// Rayonlet result: Vec<_> = data.par_iter() .filter(|x| **x > 0) .map(|x| x * 2) .collect();// Blitz - composable iteratorsconst result = blitz.iter(i64, data) .filter(isPositive) .map(double);Results Interpretation
Section titled “Results Interpretation”Expected Performance Ranges
Section titled “Expected Performance Ranges”| Operation | Blitz vs Rayon |
|---|---|
| Fork-join overhead | Blitz 10-20% faster |
| Parallel sum (large) | Within 5% |
| Parallel sort | Within 10% |
| Iterator chains | Rayon may be faster* |
*Rayon’s lazy iterators can fuse operations
When Blitz Wins
Section titled “When Blitz Wins”- Lock-free wake: Lower latency for task spawning
- Simple operations: Less abstraction overhead
- Comptime specialization: Zero-cost generics
When Rayon Wins
Section titled “When Rayon Wins”- Iterator fusion: Chains of operations can be optimized
- Adaptive stealing: More sophisticated heuristics
- Mature optimizations: Years of tuning
Fair Comparison Guidelines
Section titled “Fair Comparison Guidelines”1. Same Hardware
Section titled “1. Same Hardware”Run both on identical machine with same conditions.
2. Same Thread Count
Section titled “2. Same Thread Count”// Rayonrayon::ThreadPoolBuilder::new() .num_threads(10) .build_global() .unwrap();// Blitztry blitz.initWithConfig(.{ .background_worker_count = 9 });3. Same Data
Section titled “3. Same Data”Generate identical test data for both:
// Rayonlet data: Vec<i64> = (0..n).map(|i| i as i64).collect();// Blitzfor (data, 0..) |*v, i| v.* = @intCast(i);4. Same Optimization Level
Section titled “4. Same Optimization Level”# Rayoncargo build --release # -O3
# Blitzzig build bench # Uses ReleaseFast (similar to -O3)5. Warmup Both
Section titled “5. Warmup Both”// Rayon warmuplet _ = data.par_iter().sum::<i64>();// Blitz warmup_ = blitz.iter(i64, data).sum();Sample Results
Section titled “Sample Results”Hardware: Apple M1 Pro, 10 coresData: 10M i64 elements
+--------------------+-----------+-----------+-------------+| Benchmark | Blitz | Rayon | Comparison |+--------------------+-----------+-----------+-------------+| Fork-join (2M) | 0.54 ns | 0.66 ns | Blitz +22% || Parallel sum | 1.1 ms | 1.2 ms | Blitz +9% || Parallel sort | 134 ms | 145 ms | Blitz +8% || Parallel fib(45) | 411 ms | 414 ms | Equal || Find first | 3.3 ms | 3.1 ms | Rayon +6% |+--------------------+-----------+-----------+-------------+Reporting Results
Section titled “Reporting Results”When sharing benchmark results:
- Include hardware specs (CPU, cores, memory)
- Include software versions (Zig version, Rust version)
- Show multiple runs (min, median, max)
- Describe data patterns (random, sorted, etc.)
- Note thread count used for comparison