Skip to content
v1.0.0-zig0.15.2

Core API Reference

This document provides comprehensive documentation for all Blitz APIs, including function signatures, parameters, return types, examples, limitations, and best practices.

Blitz uses a global thread pool that must be initialized before use.

Initialize the thread pool with default settings (CPU count - 1 workers).

try blitz.init();
defer blitz.deinit();

Errors: Returns error.AlreadyInitialized if called twice, or thread creation errors.

initWithConfig(config: ThreadPoolConfig) !void

Section titled “initWithConfig(config: ThreadPoolConfig) !void”

Initialize with custom configuration.

try blitz.initWithConfig(.{
.background_worker_count = 8,
});
defer blitz.deinit();

Parameters:

FieldTypeDefaultDescription
background_worker_count?usizenull (auto)Number of worker threads. null = CPU count - 1

Shut down the thread pool and release resources.

blitz.deinit();

Notes:

  • Waits for all pending work to complete
  • Safe to call multiple times
  • Must be called before program exit to avoid resource leaks

Check if the thread pool is initialized.

if (!blitz.isInitialized()) {
try blitz.init();
}

Get the number of worker threads.

const workers = blitz.numWorkers();
std.debug.print("Using {} workers\n", .{workers});

The iterator API is the recommended way to use Blitz. It provides composable, chainable operations that automatically parallelize when beneficial.

Create an immutable parallel iterator over a slice.

const data = [_]i64{ 1, 2, 3, 4, 5 };
const it = blitz.iter(i64, &data);
const sum = it.sum(); // 15

iterMut(T, slice) ParallelMutSliceIterator(T)

Section titled “iterMut(T, slice) ParallelMutSliceIterator(T)”

Create a mutable parallel iterator for in-place operations.

var data = [_]i64{ 1, 2, 3, 4, 5 };
blitz.iterMut(i64, &data).mapInPlace(double);
// data is now [2, 4, 6, 8, 10]

Create a parallel iterator over an index range.

// Sum of 0 + 1 + 2 + ... + 99
const sum = blitz.range(0, 100).sum(i64, identity);
// Execute function for each index
blitz.range(0, 1000).forEach(processIndex);

Compute the sum of all elements in parallel.

const data = [_]i64{ 1, 2, 3, 4, 5 };
const total = blitz.iter(i64, &data).sum(); // 15

Find the minimum element in parallel.

const data = [_]i64{ 5, 2, 8, 1, 9 };
if (blitz.iter(i64, &data).min()) |m| {
std.debug.print("Min: {}\n", .{m}); // Min: 1
}

Find the maximum element in parallel.

const data = [_]i64{ 5, 2, 8, 1, 9 };
if (blitz.iter(i64, &data).max()) |m| {
std.debug.print("Max: {}\n", .{m}); // Max: 9
}

Perform a custom parallel reduction.

const data = [_]i64{ 1, 2, 3, 4, 5 };
// Product of all elements
const product = blitz.iter(i64, &data).reduce(1, struct {
fn mul(a: i64, b: i64) i64 { return a * b; }
}.mul); // 120

Requirements:

  • combine must be associative: combine(a, combine(b, c)) == combine(combine(a, b), c)
  • identity must be the identity element: combine(identity, x) == x

All search methods support early exit - they stop processing as soon as the result is determined.

Find any element matching the predicate. Fastest search method but non-deterministic.

const data = [_]i64{ 1, 2, 3, 4, 5, 6 };
const even = blitz.iter(i64, &data).findAny(struct {
fn isEven(x: i64) bool { return @mod(x, 2) == 0; }
}.isEven);

Find the leftmost element matching the predicate.

if (blitz.iter(i64, &data).findFirst(isEven)) |result| {
std.debug.print("First even: {} at index {}\n", .{ result.value, result.index });
}

Check if any element matches the predicate. Supports early exit.

const has_even = blitz.iter(i64, &data).any(isEven);

Check if all elements match the predicate. Supports early exit.

const all_positive = blitz.iter(i64, &data).all(isPositive);

These methods require iterMut (mutable iterator).

Transform each element in place.

var data = [_]i64{ 1, 2, 3, 4, 5 };
blitz.iterMut(i64, &data).mapInPlace(struct {
fn double(x: i64) i64 { return x * 2; }
}.double);
// data is now [2, 4, 6, 8, 10]

Set all elements to a value.

var data = [_]i64{ 1, 2, 3, 4, 5 };
blitz.iterMut(i64, &data).fill(0);
// data is now [0, 0, 0, 0, 0]

For divide-and-conquer algorithms and parallel task execution.

Execute multiple tasks in parallel and collect results.

const result = blitz.join(.{
.user = .{ fetchUserById, user_id },
.posts = .{ fetchPostsByUser, user_id },
});
// result.user, result.posts

Parameters: Anonymous struct where each field is either:

  • A function pointer (no arguments)
  • A tuple .{ function, arg1, arg2, ... } (up to 10 arguments)

Returns: Struct with same field names, containing each task’s return value.


All sort operations are in-place and use parallel PDQSort (Pattern-Defeating Quicksort).

Sort a slice in ascending order.

var data = [_]i64{ 5, 2, 8, 1, 9, 3 };
blitz.sortAsc(i64, &data);
// data is now [1, 2, 3, 5, 8, 9]

Sort a slice in descending order.

Sort with a custom comparator.

var data = [_]i64{ 5, -2, 8, -1, 9 };
// Sort by absolute value
blitz.sort(i64, &data, struct {
fn byAbs(a: i64, b: i64) bool {
return @abs(a) < @abs(b);
}
}.byAbs);

Sort by extracting a comparable key from each element.

const Person = struct { name: []const u8, age: u32 };
var people: []Person = ...;
blitz.sortByKey(Person, u32, &people, struct {
fn getAge(p: Person) u32 { return p.age; }
}.getAge);

sortByCachedKey(T, K, allocator, data, keyFn) !void

Section titled “sortByCachedKey(T, K, allocator, data, keyFn) !void”

Two-phase sort: compute keys in parallel, then sort by cached keys.


For fine-grained control over parallelism.

Execute a function in parallel over a range.

const Context = struct {
data: []f64,
multiplier: f64,
};
blitz.parallelFor(data.len, Context, .{
.data = data,
.multiplier = 2.0,
}, struct {
fn body(ctx: Context, start: usize, end: usize) void {
for (ctx.data[start..end]) |*x| {
x.* *= ctx.multiplier;
}
}
}.body);

parallelForWithGrain(n, Context, ctx, body, grain) void

Section titled “parallelForWithGrain(n, Context, ctx, body, grain) void”

Execute a function in parallel over a range with explicit grain size control.

Use this when you need fine-grained control over parallelization granularity. For most cases, prefer parallelFor() which auto-tunes based on data size.

Parameters:

  • n: Range size [0, n)
  • Context: Type holding shared data
  • ctx: Context instance
  • body: fn(Context, start: usize, end: usize) void
  • grain: Minimum elements per chunk
// Process with small grain for expensive operations
const Context = struct {
input: []const f64,
output: []f64,
};
blitz.parallelForWithGrain(
data.len,
Context,
.{ .input = input, .output = output },
struct {
fn body(ctx: Context, start: usize, end: usize) void {
for (ctx.input[start..end], ctx.output[start..end]) |in, *out| {
out.* = expensiveComputation(in);
}
}
}.body,
100, // Small grain = more parallelism for expensive ops
);
// Large grain for cheap operations (reduces overhead)
blitz.parallelForWithGrain(n, void, {}, struct {
fn body(_: void, start: usize, end: usize) void {
for (start..end) |i| {
cheapOperation(i);
}
}
}.body, 10000); // Large grain = less overhead for cheap ops

Grain size guidelines:

Operation CostRecommended Grain
Expensive (>1μs/element)64-256
Medium (~100ns/element)256-1024
Cheap (<10ns/element)4096-10000

parallelReduce(T, n, identity, Context, ctx, map, combine) T

Section titled “parallelReduce(T, n, identity, Context, ctx, map, combine) T”

Parallel map-reduce with full control over mapping and combining.

Maps each index to a value using map(ctx, index), then combines all values using the associative combine(a, b) function in a divide-and-conquer pattern.

Parameters:

  • T: Result type
  • n: Range size [0, n)
  • identity: Identity value for combine (e.g., 0 for sum, 1 for product)
  • Context: Type holding shared data
  • ctx: Context instance
  • map: fn(Context, usize) T - Extract/compute value at index
  • combine: fn(T, T) T - Associative binary operation

Requirements:

  • combine must be associative: combine(a, combine(b, c)) == combine(combine(a, b), c)
  • identity must satisfy: combine(identity, x) == x
// Sum of squares
const data = [_]i64{ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
const sum_of_squares = blitz.parallelReduce(
i64, // Result type
data.len, // Range size
0, // Identity for addition
[]const i64, // Context type
&data, // Context value
struct {
fn map(d: []const i64, i: usize) i64 {
return d[i] * d[i]; // Square each element
}
}.map,
struct {
fn combine(a: i64, b: i64) i64 {
return a + b; // Sum the squares
}
}.combine,
);
// sum_of_squares = 385
// Dot product of two vectors
const DotCtx = struct { a: []const f64, b: []const f64 };
const dot_product = blitz.parallelReduce(
f64,
a.len,
0.0,
DotCtx,
.{ .a = a, .b = b },
struct {
fn map(ctx: DotCtx, i: usize) f64 {
return ctx.a[i] * ctx.b[i];
}
}.map,
struct {
fn combine(x: f64, y: f64) f64 { return x + y; }
}.combine,
);
// Find maximum value
const max_val = blitz.parallelReduce(
i64,
data.len,
std.math.minInt(i64), // Identity for max
[]const i64,
&data,
struct {
fn map(d: []const i64, i: usize) i64 { return d[i]; }
}.map,
struct {
fn combine(a: i64, b: i64) i64 { return @max(a, b); }
}.combine,
);
// Count elements matching predicate
const count = blitz.parallelReduce(
usize,
data.len,
0,
[]const i64,
&data,
struct {
fn map(d: []const i64, i: usize) usize {
return if (d[i] > 5) 1 else 0;
}
}.map,
struct {
fn combine(a: usize, b: usize) usize { return a + b; }
}.combine,
);

parallelReduceWithGrain(T, n, identity, Context, ctx, map, combine, grain) T

Section titled “parallelReduceWithGrain(T, n, identity, Context, ctx, map, combine, grain) T”

Same as parallelReduce but with explicit grain size control.

// Use smaller grain for expensive map operations
const result = blitz.parallelReduceWithGrain(
f64, n, 0.0, Context, ctx, expensiveMap, add,
256, // Smaller grain for expensive operations
);

For parallel materialization patterns (inspired by Polars).

parallelCollect(T, U, input, output, Context, ctx, mapFn) void

Section titled “parallelCollect(T, U, input, output, Context, ctx, mapFn) void”

Parallel map that collects results into an output slice.

var input: [1000]i32 = undefined;
var output: [1000]i64 = undefined;
blitz.parallelCollect(i32, i64, &input, &output, void, {}, struct {
fn map(_: void, x: i32) i64 {
return @as(i64, x) * 2;
}
}.map);

Requirements: output.len must equal input.len.

parallelMapInPlace(T, data, Context, ctx, mapFn) void

Section titled “parallelMapInPlace(T, data, Context, ctx, mapFn) void”

Transform elements in-place in parallel.

blitz.parallelMapInPlace(f64, data, void, {}, struct {
fn transform(_: void, x: f64) f64 {
return @sqrt(x);
}
}.transform);

Flatten nested slices into a single output slice in parallel.

const slices = [_][]const u32{ &.{1, 2, 3}, &.{4, 5}, &.{6, 7, 8, 9} };
var output: [9]u32 = undefined;
blitz.parallelFlatten(u32, &slices, &output);
// output = [1, 2, 3, 4, 5, 6, 7, 8, 9]

parallelScatter(T, values, indices, output) void

Section titled “parallelScatter(T, values, indices, output) void”

Scatter values to output using index mapping.

const values = [_]u32{ 100, 200, 300 };
const indices = [_]usize{ 5, 0, 3 };
var output: [10]u32 = undefined;
blitz.parallelScatter(u32, &values, &indices, &output);
// output[0]=200, output[3]=300, output[5]=100

All tasks complete before errors propagate. See Error Handling for details.

Execute error-returning tasks in parallel with error safety.

const result = try blitz.tryJoin(.{
.user = fetchUser, // returns !User
.posts = fetchPosts, // returns ![]Post
});
// result.user, result.posts

tryForEach(n, E, Context, ctx, bodyFn) E!void

Section titled “tryForEach(n, E, Context, ctx, bodyFn) E!void”

Parallel iteration with error handling.

try blitz.tryForEach(data.len, ParseError, Context, ctx, struct {
fn body(c: Context, start: usize, end: usize) ParseError!void {
for (start..end) |i| {
try processItem(c.data[i]);
}
}
}.body);

tryReduce(T, E, n, identity, Context, ctx, mapFn, combineFn) E!T

Section titled “tryReduce(T, E, n, identity, Context, ctx, mapFn, combineFn) E!T”

Parallel reduction with error handling.

const total = try blitz.tryReduce(
i64, ParseError, data.len, 0, Context, ctx,
struct { fn map(c: Context, i: usize) ParseError!i64 { return try parse(c.data[i]); } }.map,
struct { fn combine(a: i64, b: i64) i64 { return a + b; } }.combine,
);

For dynamic task spawning. See Scope & Broadcast for details.

Execute a scope function. Tasks spawned within run in parallel when the scope exits.

blitz.scope(struct {
fn run(s: *blitz.Scope) void {
s.spawn(task1);
s.spawn(task2);
s.spawn(task3);
}
}.run);

Execute a function on all worker threads.

blitz.broadcast(struct {
fn run(worker_index: usize) void {
// Runs once on each worker thread
initThreadLocal(worker_index);
}
}.run);

Get pool statistics for debugging.

const stats = blitz.getStats();
std.debug.print("Executed: {}, Stolen: {}\n", .{ stats.executed, stats.stolen });

The grain size controls the minimum work unit.

Set the global grain size.

blitz.setGrainSize(1024);

Get the current grain size.

Get the default grain size (65536).

You can also use the constant blitz.DEFAULT_GRAIN_SIZE (65536).

Guidelines:

Operation CostRecommended Grain
Trivial (add, compare)10000-65536
Light (simple math)1024-10000
Medium (string ops)256-1024
Heavy (I/O, allocation)64-256

ComponentThread Safety
iter(), iterMut()Create from any thread
Iterator methodsExecute on worker threads
join()Safe from any thread
Input slicesMust not be modified during operation
Output of iterMutSafe after operation completes

Important: Do not modify input data while a parallel operation is running.