Parallel Computing
Chuks is designed for real-world backend workloads where parallel computation is critical — from processing large datasets to running CPU-intensive algorithms across multiple cores. This guide covers how parallel computing works in Chuks and how it compares to other languages.
How Chuks Achieves Parallelism
Section titled “How Chuks Achieves Parallelism”Chuks uses AOT (Ahead-of-Time) compilation to produce highly optimized native binaries.
The spawn keyword enables true parallel execution across CPU cores using a lightweight M:N scheduling model. Tasks are efficiently distributed across available cores with no global interpreter lock, no manual thread pools, and no runtime isolation boundaries.
This allows Chuks programs to scale linearly with core count while keeping concurrency simple and expressive.
// This spawns a real goroutine on a separate CPU corevar task: Task<int> = spawn heavyComputation(data)Chuks uses a lightweight M:N scheduled concurrency model that efficiently distributes tasks across OS threads and CPU cores. The runtime automatically balances work using a work-stealing scheduler, delivering parallel performance comparable to native Go.
Parallel Patterns
Section titled “Parallel Patterns”Fan-Out / Fan-In
Section titled “Fan-Out / Fan-In”Split work across multiple workers, then collect results:
function isPrime(n: int): bool { if (n < 2) { return false } if (n < 4) { return true } if (n % 2 == 0 || n % 3 == 0) { return false } var i = 5 while (i * i <= n) { if (n % i == 0 || n % (i + 2) == 0) { return false } i = i + 6 } return true}
function countPrimesRange(start: int, end: int): int { var count = 0 var i = start if (i % 2 == 0) { i = i + 1 } while (i < end) { if (isPrime(i)) { count = count + 1 } i = i + 2 } if (start <= 2 && end > 2) { count = count + 1 } return count}
var N = 5000000var chunkSize: int = N / 4
// Fan-out: spawn 4 parallel workersvar t1: Task<int> = spawn countPrimesRange(0, chunkSize)var t2: Task<int> = spawn countPrimesRange(chunkSize, chunkSize * 2)var t3: Task<int> = spawn countPrimesRange(chunkSize * 2, chunkSize * 3)var t4: Task<int> = spawn countPrimesRange(chunkSize * 3, N)
// Fan-in: collect resultsvar r1: int = await t1var r2: int = await t2var r3: int = await t3var r4: int = await t4var total: int = r1 + r2 + r3 + r4println("Total primes: " + string(total))Sequential vs Parallel
Section titled “Sequential vs Parallel”Compare single-threaded and multi-threaded execution:
var N = 5000000var chunkSize: int = N / 4
// Sequential — single corevar seqStart: int = time_now()var seqCount: int = countPrimesRange(0, N)var seqMs: int = time_now() - seqStart
// Parallel — 4 coresvar parStart: int = time_now()var t1: Task<int> = spawn countPrimesRange(0, chunkSize)var t2: Task<int> = spawn countPrimesRange(chunkSize, chunkSize * 2)var t3: Task<int> = spawn countPrimesRange(chunkSize * 2, chunkSize * 3)var t4: Task<int> = spawn countPrimesRange(chunkSize * 3, N)var r1: int = await t1var r2: int = await t2var r3: int = await t3var r4: int = await t4var parCount: int = r1 + r2 + r3 + r4var parMs: int = time_now() - parStart
println("Sequential: " + string(seqMs) + "ms")println("Parallel: " + string(parMs) + "ms")Cross-Language Benchmark
Section titled “Cross-Language Benchmark”We benchmarked the same parallel computing task across 6 languages to see how Chuks performs in real-world CPU-bound parallel workloads.
The Task
Section titled “The Task”Count all prime numbers up to 5,000,000 using 4 parallel workers. Each language uses its native parallelism mechanism:
| Language | Parallelism Mechanism |
|---|---|
| Chuks | spawn → Go goroutines (AOT) |
| Go | go func() + sync.WaitGroup |
| Java | ExecutorService + thread pool |
| Bun | worker_threads |
| Node.js | worker_threads |
| Python | multiprocessing.Pool |
All implementations use the same algorithm (trial division with 6k±1 optimization) and the same work distribution (4 equal chunks).
Results
Section titled “Results”Tested on Apple M4 Max (16-core), March 2026.
Parallel Execution (4 workers)
Section titled “Parallel Execution (4 workers)”| Language | Parallel Time | Relative to Chuks |
|---|---|---|
| Go | 39ms | 0.97x |
| Chuks (AOT) | 40ms | 1.0x (baseline) |
| Java | 49ms | 1.23x |
| Bun | 51ms | 1.28x |
| Node.js | 71ms | 1.78x |
| Python | 2,661ms | 66.5x |
Sequential Execution (single-threaded baseline)
Section titled “Sequential Execution (single-threaded baseline)”| Language | Sequential Time | Relative to Chuks |
|---|---|---|
| Chuks (AOT) | 111ms | 1.0x (baseline) |
| Go | 117ms | 1.05x |
| Bun | 127ms | 1.14x |
| Java | 129ms | 1.16x |
| Node.js | 182ms | 1.64x |
| Python | 7,933ms | 71.5x |
Parallel Speedup (sequential ÷ parallel)
Section titled “Parallel Speedup (sequential ÷ parallel)”| Language | Speedup |
|---|---|
| Go | ~3.0x |
| Chuks (AOT) | ~2.8x |
| Python | ~3.0x |
| Java | ~2.6x |
| Node.js | ~2.6x |
| Bun | ~2.5x |
Analysis
Section titled “Analysis”Chuks matches Go’s performance — this is expected since Chuks AOT compiles directly to Go. The goroutine-based parallelism gives both languages near-optimal CPU utilization with minimal scheduling overhead. In sequential mode, Chuks is actually slightly faster than Go at 111ms vs 117ms.
Java performs well but has ~23% overhead compared to Chuks/Go due to JVM startup and thread pool management costs.
Bun and Node.js both use worker_threads for parallelism. Bun is faster than Node thanks to JavaScriptCore’s optimizing JIT compiler, but both have higher overhead than goroutines due to isolate/worker creation costs.
Python is ~67x slower in absolute terms due to the CPython interpreter overhead. However, it achieves a solid ~3x parallel speedup thanks to multiprocessing bypassing the GIL entirely (each worker is a separate OS process).
Chuks VM (not shown above) runs the same benchmark in 3,191ms parallel — faster than Python but obviously much slower than AOT. For CPU-intensive parallel work, always use chuks build for AOT compilation.
What Makes Chuks Fast
Section titled “What Makes Chuks Fast”-
Lightweight parallel tasks : Chuks uses extremely lightweight scheduled tasks that start with small stack footprints, allowing thousands of parallel operations without the memory overhead of traditional OS threads.
-
Work-stealing scheduling model : The runtime automatically balances tasks across available CPU cores using a work-stealing scheduler. No manual thread pools, affinity tuning, or executor management required.
-
Ahead-of-Time native compilation : In AOT mode, Chuks compiles to optimized native binaries with no interpreter overhead. Concurrency constructs such as
spawntranslate directly to native parallel execution primitives, with zero wrapper layers. -
True multi-core parallelism : There is no Global Interpreter Lock. CPU-bound tasks execute simultaneously across cores, without isolate boundaries or process-level duplication.
When to Use Parallel Computing
Section titled “When to Use Parallel Computing”| Scenario | Approach |
|---|---|
| Process large dataset in chunks | spawn workers per chunk |
| CPU-intensive algorithm | spawn to utilize multiple cores |
| Independent computations | spawn each, await all |
| I/O-bound work (HTTP, DB, files) | async/await instead |
| Mixed I/O + CPU | async for I/O, spawn for CPU |
For more details on the concurrency model, see the Concurrency guide.