Parallel Computing

Chuks is designed for real-world backend workloads where parallel computation is critical — from processing large datasets to running CPU-intensive algorithms across multiple cores. This guide covers how parallel computing works in Chuks and how it compares to other languages.

How Chuks Achieves Parallelism

Chuks uses AOT (Ahead-of-Time) compilation to produce highly optimized native binaries.

The spawn keyword enables true parallel execution across CPU cores using a lightweight M:N scheduling model. Tasks are efficiently distributed across available cores with no global interpreter lock, no manual thread pools, and no runtime isolation boundaries.

This allows Chuks programs to scale linearly with core count while keeping concurrency simple and expressive.

// This spawns a real goroutine on a separate CPU core
var task: Task<int> = spawn heavyComputation(data)

Chuks uses a lightweight M:N scheduled concurrency model that efficiently distributes tasks across OS threads and CPU cores. The runtime automatically balances work using a work-stealing scheduler, delivering parallel performance comparable to native Go.

Parallel Patterns

Fan-Out / Fan-In

Split work across multiple workers, then collect results:

function isPrime(n: int): bool {
    if (n < 2) { return false }
    if (n < 4) { return true }
    if (n % 2 == 0 || n % 3 == 0) { return false }
    var i = 5
    while (i * i <= n) {
        if (n % i == 0 || n % (i + 2) == 0) { return false }
        i = i + 6
    }
    return true
}

function countPrimesRange(start: int, end: int): int {
    var count = 0
    var i = start
    if (i % 2 == 0) { i = i + 1 }
    while (i < end) {
        if (isPrime(i)) { count = count + 1 }
        i = i + 2
    }
    if (start <= 2 && end > 2) { count = count + 1 }
    return count
}

var N = 5000000
var chunkSize: int = N / 4

// Fan-out: spawn 4 parallel workers
var t1: Task<int> = spawn countPrimesRange(0, chunkSize)
var t2: Task<int> = spawn countPrimesRange(chunkSize, chunkSize * 2)
var t3: Task<int> = spawn countPrimesRange(chunkSize * 2, chunkSize * 3)
var t4: Task<int> = spawn countPrimesRange(chunkSize * 3, N)

// Fan-in: collect results
var r1: int = await t1
var r2: int = await t2
var r3: int = await t3
var r4: int = await t4
var total: int = r1 + r2 + r3 + r4
println("Total primes: " + string(total))

Sequential vs Parallel

Compare single-threaded and multi-threaded execution:

var N = 5000000
var chunkSize: int = N / 4

// Sequential — single core
var seqStart: int = time_now()
var seqCount: int = countPrimesRange(0, N)
var seqMs: int = time_now() - seqStart

// Parallel — 4 cores
var parStart: int = time_now()
var t1: Task<int> = spawn countPrimesRange(0, chunkSize)
var t2: Task<int> = spawn countPrimesRange(chunkSize, chunkSize * 2)
var t3: Task<int> = spawn countPrimesRange(chunkSize * 2, chunkSize * 3)
var t4: Task<int> = spawn countPrimesRange(chunkSize * 3, N)
var r1: int = await t1
var r2: int = await t2
var r3: int = await t3
var r4: int = await t4
var parCount: int = r1 + r2 + r3 + r4
var parMs: int = time_now() - parStart

println("Sequential: " + string(seqMs) + "ms")
println("Parallel:   " + string(parMs) + "ms")

Cross-Language Benchmark

We benchmarked the same parallel computing task across 6 languages to see how Chuks performs in real-world CPU-bound parallel workloads.

The Task

Count all prime numbers up to 5,000,000 using 4 parallel workers. Each language uses its native parallelism mechanism:

Language	Parallelism Mechanism
Chuks	`spawn` → Go goroutines (AOT)
Go	`go func()` + `sync.WaitGroup`
Java	`ExecutorService` + thread pool
Bun	`worker_threads`
Node.js	`worker_threads`
Python	`multiprocessing.Pool`

All implementations use the same algorithm (trial division with 6k±1 optimization) and the same work distribution (4 equal chunks).

Results

Tested on Apple M4 Max (16-core), March 2026.

Parallel Execution (4 workers)

Language	Parallel Time	Relative to Chuks
Go	39ms	0.97x
Chuks (AOT)	40ms	1.0x (baseline)
Java	49ms	1.23x
Bun	51ms	1.28x
Node.js	71ms	1.78x
Python	2,661ms	66.5x

Sequential Execution (single-threaded baseline)

Language	Sequential Time	Relative to Chuks
Chuks (AOT)	111ms	1.0x (baseline)
Go	117ms	1.05x
Bun	127ms	1.14x
Java	129ms	1.16x
Node.js	182ms	1.64x
Python	7,933ms	71.5x

Parallel Speedup (sequential ÷ parallel)

Language	Speedup
Go	~3.0x
Chuks (AOT)	~2.8x
Python	~3.0x
Java	~2.6x
Node.js	~2.6x
Bun	~2.5x

Analysis

Chuks matches Go’s performance — this is expected since Chuks AOT compiles directly to Go. The goroutine-based parallelism gives both languages near-optimal CPU utilization with minimal scheduling overhead. In sequential mode, Chuks is actually slightly faster than Go at 111ms vs 117ms.

Java performs well but has ~23% overhead compared to Chuks/Go due to JVM startup and thread pool management costs.

Bun and Node.js both use worker_threads for parallelism. Bun is faster than Node thanks to JavaScriptCore’s optimizing JIT compiler, but both have higher overhead than goroutines due to isolate/worker creation costs.

Python is ~67x slower in absolute terms due to the CPython interpreter overhead. However, it achieves a solid ~3x parallel speedup thanks to multiprocessing bypassing the GIL entirely (each worker is a separate OS process).

Chuks VM (not shown above) runs the same benchmark in 3,191ms parallel — faster than Python but obviously much slower than AOT. For CPU-intensive parallel work, always use chuks build for AOT compilation.

What Makes Chuks Fast

Lightweight parallel tasks : Chuks uses extremely lightweight scheduled tasks that start with small stack footprints, allowing thousands of parallel operations without the memory overhead of traditional OS threads.
Work-stealing scheduling model : The runtime automatically balances tasks across available CPU cores using a work-stealing scheduler. No manual thread pools, affinity tuning, or executor management required.
Ahead-of-Time native compilation : In AOT mode, Chuks compiles to optimized native binaries with no interpreter overhead. Concurrency constructs such as spawn translate directly to native parallel execution primitives, with zero wrapper layers.
True multi-core parallelism : There is no Global Interpreter Lock. CPU-bound tasks execute simultaneously across cores, without isolate boundaries or process-level duplication.

When to Use Parallel Computing

Scenario	Approach
Process large dataset in chunks	`spawn` workers per chunk
CPU-intensive algorithm	`spawn` to utilize multiple cores
Independent computations	`spawn` each, `await` all
I/O-bound work (HTTP, DB, files)	`async/await` instead
Mixed I/O + CPU	`async` for I/O, `spawn` for CPU

For more details on the concurrency model, see the Concurrency guide.