A CPU that runs entirely on GPU

(github.com)

101 points | by cypres 6 hours ago

17 comments

jagged-chisel 4 minutes ago
“A CPU that runs entirely on the GPU”
I imagine a carefully crafted set of programming primitives used to build up the abstraction of a CPU…
“Every ALU operation is a trained neural network.”
Oh… oh. Fun. Just not the type of “interesting” I was hoping for.
bmc7505 4 hours ago
As foretold six years ago. [1]
[1]: https://breandan.net/2020/06/30/graph-computation#roadmap
[-]
- toolslive 45 minutes ago
  https://en.wikipedia.org/wiki/Xeon_Phi#Knights_Landing ?
bob1029 1 hour ago
A fun experiment but I wonder how many out there seriously think we could ever completely rid ourselves of the CPU. It seems to be a rising sentiment.
The cost of communicating information through space is dealt with in fundamentally different ways here. On the CPU it is addressed directly. The actual latency is minimized as much as possible, usually by predicting the future in various ways and keeping the spatial extent of each device (core complex) as small as possible. The GPU hides latency with massive parallelism. That's why we can put them across relatively slow networks and still see excellent performance.
Latency hiding cannot deal well in workloads that are branchy and serialized because you can only have one logical thread throughout. The CPU dominates this area because it doesn't cheat. It directly targets the objective. Making efficient, accurate control flow decisions tends to be more valuable than being able to process data in large volumes. It just happens that there are a few exceptions to this rule that are incredibly popular.
[-]
- fc417fc802 11 minutes ago
  > I wonder how many out there seriously think we could ever completely rid ourselves of the CPU.
  How do you class systems like the PS5 that have an APU plugged into GDDR instead of regular RAM? The primary remaining issue is the limited memory capacity.
  I wonder if we might see a system with GPU class HBM on the package in lieu of VRAM coupled with regular RAM on the board for the CPU portion?
- volemo 57 minutes ago
  I see us not getting rid of CPU, but CPU and GPU being eventually consolidated in one system of heterogeneous computing units.
  [-]
  - jagged-chisel 0 minutes ago
    Agreed. Much like “RISC is gonna replace everything” - it didn’t. Because the CPU makers incorporated lessons from RISC into their designs.
    I can see the same happening to the CPU. It will just take on the appropriate functionality to keep all the compute in the same chip.
    It’s gonna take awhile because Nvidia et al like their moats.
nomercy400 1 hour ago
I was taught years ago that MUL and ADD can be implemented in one or a few cycles. They can be the same complexity. What am I missing here?
Also, is it possible to use the GPU's ADD/MUL implementation? It is what a GPU does best.
[-]
- volemo 46 minutes ago
  To multiply two arbitrary numbers in a single cycle, you need to include dedicated hardware into your ALU, without it you have to combine several additions and logical shifts.
  As to why not use the ADD/MUL capabilities of the GPU itself, I guess it wasn’t in the spirit of the challenge. ;)
deep1283 3 hours ago
This is a fun idea. What surprised me is the inversion where MUL ends up faster than ADD because the neural LUT removes sequential dependency while the adder still needs prefix stages.
andrewdb 2 hours ago
Why do we call them GPUs these days?
Most GPUs, sitting in racks in datacenters, aren't "processing graphics" anyhow.
[-]
- xeonmc 1 hour ago
  General Processing Units
  Gross-Parallelization Units
  Generative Procedure Units
  Gratuitously Profiteering Unscrupulously
  [-]
  - incognito124 1 hour ago
    Greed Processing Units
- jgtrosh 1 hour ago
  The dedicated term GPGPU [0] didn't catch on.
  [0]: https://en.wikipedia.org/wiki/General-purpose_computing_on_g...
- CompuHacker 49 minutes ago
```
  CPU = Compute
  GPU =  Impute
```
throawayonthe 1 hour ago
very tangentially related is whatever vectorware et al are doing: https://www.vectorware.com/blog/
lorenzohess 4 hours ago
Out of curiosity, how much slower is this than an actual CPU?
[-]
- bastawhiz 4 hours ago
  Based on addition and subtraction, 625000x slower or so than a 2.5ghz cpu
  [-]
  - medi8r 2 hours ago
    So it could run Doom?
    [-]
    - repelsteeltje 1 hour ago
      Yes: https://github.com/robertcprice/nCPU?tab=readme-ov-file#doom...
      [-]
      - medi8r 26 minutes ago
        Oh I forgot to Doom scroll.
      - binsquare 1 hour ago
        Can we run doom inside of doom yet?
        [-]
        throawayonthe 1 hour ago
        Yes: https://github.com/kgsws/doom-in-doom
        [-]
        PowerElectronix 6 minutes ago
        What a time to be alive
        vee-kay 1 hour ago
        [dead]
        afewquarks 1 hour ago
        [dead]
sudo_cowsay 4 hours ago
"Multiplication is 12x faster than addition..."
Wow. That's cool but what happens to the regular CPU?
[-]
- adrian_b 3 hours ago
  This CPU simulator does not attempt to achieve the maximum speed that could be obtained when simulating a CPU on a GPU.
  For that a completely different approach would be needed, e.g. by implementing something akin to qemu, where each CPU instruction would be translated into a graphic shader program. On many older GPUs, it is impossible or difficult to launch a graphic program from inside a graphic program (instead of from the CPU), but where this is possible one could obtain a CPU emulation that would be many orders of magnitude faster than what is demonstrated here.
  Instead of going for speed, the project demonstrates a simpler self-contained implementation based on the same kind of neural networks used for ML/AI, which might work even on an NPU, not only on a GPU.
  Because it uses inappropriate hardware execution units, the speed is modest and the speed ratios between different kinds of instructions are weird, but nonetheless this is an impressive achievement, i.e. simulating the complete Aarch64 ISA with such means.
  [-]
  - 5o1ecist 2 hours ago
    > where each CPU instruction would be translated into a graphic shader program
    You really think having a shader per CPU-instruction is going to get you closer to the highest possible speed one can achieve?
    [-]
    - koolala 18 minutes ago
      If its bindless and pre-compiled why not? What's a faster way?
artemonster 34 minutes ago
Every clueless person who suggest that we move to GPUs entirely have zero idea how things work and basically are suggesting using lambos to plow fields and tractors to race in nascar
RagnarD 4 hours ago
Being able to perform precise math in an LLM is important, glad to see this.
[-]
- koolala 15 minutes ago
  That would be cool. A way to read cpu assembly bytecode and then think in it.
  It's slower than real cpu code obviously but fast just 'thinking' about it. They wouldn't need to actually simulate an entire program in a never ending hot loop like a real program.
- jdjdndnzn 4 hours ago
  Just want to point out this comment is highly ironic.
  This is all a computer does :P
  We need llms to be able to tap that not add the same functionality a layer above and MUCH less efficiently.
  [-]
  - Nuzzerino 4 hours ago
    > We need llms to be able to tap that not add the same functionality a layer above and MUCH less efficiently.
    Agents, tool-integrated reasoning, even chain of thought (limited, for some math) can address this.
    [-]
    - RagnarD 2 hours ago
      You're both completely missing the point. It's important that an LLM be able to perform exact arithmetic reliably without a tool call. Of course the underlying hardware does so extremely rapidly, that's not the point.
      [-]
      - jdjdndnzn 40 minutes ago
        The computer ALREADY does do math reliably. You are missing the point.
- 5o1ecist 2 hours ago
  Why?
nicman23 4 hours ago
can i run linux on a nvidia card though?
[-]
- micw 3 hours ago
  Linux runs everywhere
  [-]
  - volemo 41 minutes ago
    Except on my stupid iPad “Pro”. :(
mrlonglong 3 hours ago
Now I've seen it all. Time to die.. (meant humourously)
Surac 4 hours ago
Well GPU are just special purpous CPU.
MadnessASAP 3 hours ago
Ya know just today I was thinking around a way to compile a neural network down to assembly. Matching and replacing neural network structures with their closest machine code equivalent.
This is way cooler though! Instead of efficiently running a neural network on a CPU, I can inefficiently run my CPU on neural network! With the work being done to make more powerful GPUs and ASICs I bet in a few years I'll be able to run a 486 at 100MHz(!!) with power consumption just under a megawatt! The mind boggles at the sort of computations this will unlock!
Few more years and I'll even be able to realise the dream of self-hosting ChatGPT on my own neural network simulated CPU!