Parallel Sorting
High-performance parallel PDQSort (Pattern-Defeating Quicksort).
Basic Usage
Section titled “Basic Usage”const blitz = @import("blitz");
var data = [_]i64{ 5, 2, 8, 1, 9, 3, 7, 4, 6 };
// Sort ascending with custom comparatorblitz.sort(i64, &data, struct { fn lessThan(a: i64, b: i64) bool { return a < b; }}.lessThan);// data is now [1, 2, 3, 4, 5, 6, 7, 8, 9]Sort Variants
Section titled “Sort Variants”Basic Sort (Custom Comparator)
Section titled “Basic Sort (Custom Comparator)”blitz.sort(T, slice, lessThanFn);Ascending Sort
Section titled “Ascending Sort”blitz.sortAsc(i64, &data);// Uses natural < orderingDescending Sort
Section titled “Descending Sort”blitz.sortDesc(i64, &data);// Uses natural > orderingSort by Key
Section titled “Sort by Key”Sort using a key extraction function:
const Person = struct { name: []const u8, age: u32, score: i32,};
var people: []Person = ...;
// Sort by ageblitz.sortByKey(Person, u32, people, struct { fn key(p: Person) u32 { return p.age; }}.key);Sort by Cached Key
Section titled “Sort by Cached Key”For expensive key functions, compute keys once in parallel:
// Two-phase: parallel key computation, then sorttry blitz.sortByCachedKey( Person, u32, allocator, people, struct { fn expensiveKey(p: Person) u32 { // Expensive computation done once per element return computeComplexScore(p); } }.expensiveKey,);When to use cached key:
- Key function is expensive (> 100ns)
- Sorting large arrays (> 10K elements)
- Key computation can be parallelized
Custom Comparator Examples
Section titled “Custom Comparator Examples”Sort Strings by Length
Section titled “Sort Strings by Length”blitz.sort([]const u8, strings, struct { fn byLength(a: []const u8, b: []const u8) bool { return a.len < b.len; }}.byLength);Sort by Absolute Value
Section titled “Sort by Absolute Value”blitz.sort(i64, &data, struct { fn byAbs(a: i64, b: i64) bool { return @abs(a) < @abs(b); }}.byAbs);Sort Structs by Multiple Fields
Section titled “Sort Structs by Multiple Fields”blitz.sort(Person, people, struct { fn compare(a: Person, b: Person) bool { // Primary: by age if (a.age != b.age) return a.age < b.age; // Secondary: by score (descending) return a.score > b.score; }}.compare);Performance
Section titled “Performance”Blitz PDQSort achieves excellent performance through:
- Parallel recursive sorting - Fork-join on large partitions
- Pattern-defeating - Handles sorted/reverse/equal inputs efficiently
- Hybrid approach - Insertion sort for small arrays, heapsort fallback
- Block partitioning - Cache-efficient element movement
Sorting 10M random i64:
std.mem.sort: 4,644 msBlitz PDQSort: 430 ms (10.8x faster)
Different patterns (10M elements):Already sorted: 36 msReverse sorted: 69 msAll equal: 36 msRandom: 430 msAlgorithm Details
Section titled “Algorithm Details”PDQSort (Pattern-Defeating Quicksort) combines:
| Technique | When Used | Benefit |
|---|---|---|
| Quicksort | Large arrays | O(n log n) average |
| Insertion sort | < 24 elements | Low overhead |
| Heapsort | Bad pivot patterns | O(n log n) guaranteed |
| Block partition | Large partitions | Cache efficiency |
| Pattern breaking | Detected patterns | Prevents O(n²) |
Parallelization Strategy
Section titled “Parallelization Strategy”Array: [........................1M elements........................] ├──────────────────────┼──────────────────────┤ │ Left Half │ Right Half │ │ (Worker 0) │ (Worker 1) │ │ │ │ │ │ │ ┌────┴────┐ │ ┌────┴────┐ │ │ Left Right │ Left Right │ │ (W0) (W2) │ (W1) (W3) │ └──────────────────────┴──────────────────────┘ Recursive fork-join- Fork at each partition (if large enough)
- Sequential threshold: 8,192 elements
- Work stealing balances uneven partitions
Stability
Section titled “Stability”Important: Blitz sort is not stable. Equal elements may be reordered.
For stable sorting needs:
// Use standard library stable sortstd.mem.sort(T, data, {}, lessThanFn);Memory Usage
Section titled “Memory Usage”- In-place: Minimal extra memory (~log n stack depth)
- sortByCachedKey: Allocates O(n) for key array
When to Use Parallel Sort
Section titled “When to Use Parallel Sort”| Data Size | Recommendation |
|---|---|
| < 1,000 | Sequential (overhead too high) |
| 1K - 10K | May parallelize (depends on comparator cost) |
| > 10K | Parallel (significant speedup) |
if (data.len > 10_000) { blitz.sort(T, data, lessThanFn);} else { std.mem.sort(T, data, {}, lessThanFn);}Complete Example
Section titled “Complete Example”const std = @import("std");const blitz = @import("blitz");
pub fn main() !void { try blitz.init(); defer blitz.deinit();
var gpa = std.heap.GeneralPurposeAllocator(.{}){}; const allocator = gpa.allocator();
// Generate random data var data = try allocator.alloc(i64, 1_000_000); defer allocator.free(data);
var rng = std.Random.DefaultPrng.init(12345); for (data) |*v| { v.* = rng.random().int(i64); }
// Sort const start = std.time.nanoTimestamp(); blitz.sortAsc(i64, data); const elapsed = std.time.nanoTimestamp() - start;
std.debug.print("Sorted 1M elements in {d:.2} ms\n", .{ @as(f64, @floatFromInt(elapsed)) / 1_000_000.0, });
// Verify for (data[1..], data[0 .. data.len - 1]) |curr, prev| { std.debug.assert(curr >= prev); }}