When Parallel Isn’t Really Parallel
2026-01-13
I recently watched an Nvidia video that illustrates something fundamental about parallelism—something that concerns me about how we’re applying these concepts across different problem domains.
The video shows nozzles working in parallel, fed by a tank of compressible gas (or incompressible fluid). (The video actually shows several compressed air tanks, but the key point is that it’s still shared resources—not one tank per nozzle.)
Our “tank” was designed by a committee of giants—giants standing on the shoulders of other giants, like Ptolemy and Rube Goldberg. Modern CPUs look a lot like that machine in the video: supposed parallelism, but tightly coupled through shared resources—computing power, shared memory, caches, buses. The cores may execute in parallel, but they’re all drinking from the same tank.
This works. No one disputes that.
The Distance Problem
But consider a different problem domain: the internet. Or distributed systems in general. Here, the units of computational power aren’t sitting millimeters apart on the same die. They’re separated by vast distances—different machines, different data centers, different continents.
To stretch the tank analogy further: imagine each nozzle controlled by a separate tank, and each of those tanks located in a different city around the globe. The coordination problem changes entirely. The latency characteristics change. The failure modes change. The architecture that works for pixel shaders simply doesn’t map.
Yet the common belief seems to be that we must use that same bank-of-shaders solution to solve these fundamentally different problems. We keep trying to force the internet-scale problem into the pixel-shader-scale solution.
A Question of Scale
I don’t think this approach can ultimately work. Not because there’s anything wrong with SIMD or SIMT—they’re brilliant for what they’re designed for. But because we’re applying a solution optimized for one problem domain to a problem that operates under entirely different constraints.
The question isn’t whether parallel processing is good. The question is whether the kind of parallelism that works when everything shares one tank scales to scenarios where the tanks themselves are distributed across the planet.
I suspect it doesn’t. And I suspect we need different architectures—architectures designed from first principles for distributed, asynchronous, loosely-coupled systems rather than retrofitted from GPU-style parallelism.
The nozzles may look parallel in both cases. But the tanks tell a different story.
A Simpler Alternative
Here’s a heretical question: wouldn’t it be simpler to replace all that silicon real estate—the caches, the cache coherence circuitry, all the complexity we’ve built to make sharing work—with standalone, simple CPUs? The kind of simple we had with the MC6809, the 6502, the PDP-11. No shared memory to coordinate. No cache coherence protocols. Just independent processors, each with its own memory, communicating when they need to.
Would that actually be more complex? Or have we convinced ourselves that the complexity we’ve built is somehow simpler than the alternative?
We don’t need to hit a big red reset button to explore this. We can use what we’ve got right now to create programs for these kinds of systems. We’ve already drifted away from the single-CPU problem domains of the 1960s anyway. Modern CPUs are quietly backing into something like standalone processors—each core running with its own private L1 cache, coordinating when necessary but mostly independent.
Our current notion of threads chops up CPU time across many applications. Each app gets a tiny slice of time. At best, we’re simulating multiple, distributed processing. But we can easily simulate programs that get to own 100% of a CPU—using closures, queue classes, fan-out supported by garbage collection. The tools are already here.
The question is whether we’re willing to think differently about how we use them.
See Also
Email: ptcomputingsimplicity@gmail.com
Substack: paultarvydas.substack.com
Videos: https://www.youtube.com/@programmingsimplicity2980
Discord: https://discord.gg/65YZUh6Jpq
Leanpub: [WIP] https://leanpub.com/u/paul-tarvydas
Twitter: @paul_tarvydas
Bluesky: @paultarvydas.bsky.social
Mastodon: @paultarvydas
(earlier) Blog: guitarvydas.github.io
References: https://guitarvydas.github.io/2024/01/06/References.html

