The state of SIMD in Rust in 2025

(shnatsel.medium.com)

78 points | by ashvardanian 2 hours ago

8 comments

josephg 1 hour ago
Why isn’t std::simd in stabile yet? Why do so many great features seem stuck in the same nightly-forever limbo land - like generators?
I’m sure more people than ever are working on the compiler. What’s going on?
[-]
- ChadNauseam 1 hour ago
  There really aren't that many people working on the compiler. It's mostly volunteers.
  The structure is unlike a traditional company. In a traditional company, the managers decide the priorities and direct the employees what to work on while facilitating that work. While there are people with a more managerial type position working on rust compiler, their job is not to tell the volunteers what to work on (they cannot), but instead to help the volunteers accomplish whatever it is they want to do.
  I don't know about std::simd specifically, but for many features, it's simply a case of "none of the very small number of people working on the rust compiler have prioritized it".
  I do wish there was a bounty system, where people could say "I really want std::simd so I'll pay $5,000 to the rust foundation if it gets stabilized". If enough people did that I'm sure they could find a way to make it happen. But I think realistically, very few people would be willing to put up even a cent for the features they want. I hear a lot of people wishing for better const generics, but only 27 people have set up a donation to boxy (lead of the const generics group https://github.com/sponsors/BoxyUwU ).
- JoshTriplett 38 minutes ago
  > Why isn’t std::simd in stable yet?
  Leaving aside any specific blockers:
  - It's a massive hard problem, to build a portable abstraction layer over the SIMD capabilities of various CPUs.
  - It's a massive balance between performance and usability, and people care deeply about both.
  - It's subject to Rust's stability guarantee for the standard library: once we ship it, we can't fix any API issues.
  - There are already portable SIMD libraries in the ecosystem, which aren't subject to that stability guarantee as they can ship new semver-major versions. (One of these days, I hope we have ways to do that for the standard library.)
  - Many people already use non-portable SIMD for the 1-3 targets they care about, instead.
  [-]
  - vlovich123 25 minutes ago
    > we can't fix any API issues.
    Can’t APIs be fixed between editions?
    [-]
    - JoshTriplett 23 minutes ago
      Partially (with upcoming support for renaming things across editions), but it's a pain if the types change (because then they're no longer common vocabulary), and all the old APIs still have to exist.
- Avi-D-coder 1 hour ago
  Usually when I go and read the github and zulip threads the reason for paused work comes down to the fact that no one has come up with a design that maintains every existing promise the compiler has made. The most common ones I see are the feature conflicts with safety, semver/encapsulation, interacts weirdly with object safety, causes post post-monomorphization errors, breaks perfect type class coherence (see haskells unsound specialization).
  Too many promises have been made.
  Rust needs more unsafe opt outs. Ironically simd has this so it does not bother me.
- singron 1 hour ago
  There is a GitHub issue that details what's blocking stabilization for a each feature. I've read a few recently and noticed some patterns:
  1. A high bar for quality in std
  2. Dependencies on other unstable features
  3. Known bugs
  4. Conflicts with other unstable features
  It seems anything that affects trait solving is very complicated and is more likely to have bugs or combine non-trivially with other trait-solving features.
  I think there is also some sampling bias. Tons of features get stabilized, but you are much more likely to notice a nightly feature that is unstable for a long time and complex enough to be excited about.
  [-]
  - vlovich123 24 minutes ago
    > Dependencies on other unstable features
    AFAIK that’s not a blocker for Rust - the std library is allowed to use unstable at all times.
    [-]
    - estebank 17 minutes ago
      I think they meant on unstable features which might yet change their semantics. A stable API relying on unstable implementation is common in Rust (? operator, for example), but that is entirely dependent on having a good idea of what the eventual stable version is going to look like, in such a way that the already stable feature won't break in any way.
- the__alchemist 1 hour ago
  Would love this. I've heard it's not planned to be in the near future. Maybe "perfect is the enemy of good enough"?
  [-]
  - CooCooCaCha 1 hour ago
    Rust doesn’t have a BDFL so there’s nobody with the power to push things through when they’re good enough.
    And since Rust basically sells itself on high standards (zero-cost abstractions, etc.) the devs go back and forth until it feels like the solution is handed down from the heavens.
    [-]
    - ChadNauseam 1 hour ago
      And somehow it has ended up feeling more pleasant and consistent than most languages with a BDFL, even though it was designed by committee. I don't really understand how that happened, but I appreciate the cautious and conservative approach they've taken
- IshKebab 1 hour ago
  I would love generators too but I think the more features they add the more interactions with existing features they have to deal with, so it's not surprising that its slowing down.
  [-]
  - estebank 30 minutes ago
    Generators in particular has been blocked on the AsyncIterator trait. There are also open questions around consuming those (`for await i in stream`, or just keep to `while let Some(i) in stream.next().await`? What about parallel iteration? What about pinning obligations? Do that as part of desugaring or making it explicit?). It is a shame because it is almost orthogonal, but any given decision might not be compatible with different approaches for generators. The good news is that some people are working on it again.
bencyoung 1 hour ago
Odd that c# has a better stable SIMD story than Rust! It has both generic vector types across a range of sizes and a good set of intrinsics across most of the common instruction sets
[-]
- kelnos 1 hour ago
  Why would that be odd? C# is an older and mature language backed by a corporation, while Rust is younger and has been run by a small group of volunteers for years now.
  [-]
  - booi 27 minutes ago
    not just any corporation.. the largest software corporation on the planet
- exyi 54 minutes ago
  C# portable SIMD is very nice indeed, but it's also not usable without unsafety. On the other hand, Rust compiler (LLVM) has a fairly competent autovectorizer, so you may be able to simply write loops the right way instead of the fancy API.
- jiehong 1 hour ago
  C# is blessed on that front. Java’s SIMD state is still sad, and golang is not as great either.
  [-]
  - ashf023 52 minutes ago
    Yeah, golang is a particular nightmare for SIMD. You have to write plan 9 assembly, look up what they renamed every instruction to, and then sometimes find that the compiler doesn't actually support that instruction, even though it's part of an ISA they broadly support. Go assembly functions are also not allowed to use the register-based calling convention, so all arguments are passed on the stack, and the compiler will never inline it. So without compiler support I don't believe there's any way to do something like intrinsics even. Fortunately compiler support for intrinsics seems to be on its way! https://github.com/golang/go/issues/73787
jtrueb 52 minutes ago
simd was one I thought we needed. Then, i started benchmarking using iter with chunks and a nested if statement to check the chunk size. If it was necessary to do more, it was typically time to drop down to asm rather than worry about another layer in between the code and the machine.
taeric 28 minutes ago
I'm curious on the uptake of SIMD and other assembly level usage through high level code? I'd assume most is done either by people writing very low level code that directly manages the data, or by using very high level libraries that are prescriptive on what data they work with?
How many people are writing somewhat bog standard RUST/C and expect optimal assembly to be created?
waffletower 29 minutes ago
I am torn -- while I love the bitter critique of std::simd's nightly builds (why bother with any public release if it is never stable?), I cringed at the critique of "(c)urrently things are well fleshed out for i32, i64, f32, and f64 types". f64 and i64 go a long way for most numerical applications -- the OP seemed snowflaky to me with that entitled concern.
mdriley 2 hours ago
> TL;DR: use std::simd if you don’t mind nightly, wide if you don’t need multiversioning, and otherwise pulp or macerator.
This matches the conclusion we reached for Chromium. We were okay with nightly, so we're using `std::simd` but trying to avoid the least stable APIs. More details: https://docs.google.com/document/d/1lh9x43gtqXFh5bP1LeYevWj0...
[-]
- vlovich123 21 minutes ago
  Do you compile the whole project with nightly or just specific components?
the__alchemist 1 hour ago
Of interest, I've written my own core::simd mimic so I don't have to make all my libs and programs use nightly. It started as me just making my Quaternion and Vec lib (lin-alg) have their own SoA SIMD variants (Vec3x16 etc), but I ended up implementing and publicly exposing f32x16 etc. Will remove those once core::simd is stable. Downside: These are x86 only; no ARM support.
I also added packing and unpacking helpers that assist with handling final lane 0 values etc. But there is still some subtly, as the article pointed out, compared to using Rayon or non-SIMD CPU code related to packing and unpacking. E.g. you should try to keep things in their SIMD form throughout the whole pipeline, how you pair them with non-SIMD values (Like you might pair [T; 8] with f32x8 etc) etc.
[-]
- ____tom____ 1 hour ago
  I'm not a rust programmer.
  Can't you just make a local copy of the existing package and use that? Did you need to re-implement?
  [-]
  - dzaima 40 minutes ago
    The nightly built-in core::simd makes use of a bunch of intrinsics to "implement" the SIMD ops (or, rather, directly delegate the implementation to LLVM which you otherwise cannot do from plain Rust), which are as much if not more volatile than core::simd itself (and also nightly-only).
    [-]
    - vlovich123 22 minutes ago
      > or, rather, directly delegate the implementation to LLVM which you otherwise cannot do from plain Rust
      I thought the intrinsic specifically were available in plain safe rust and the alignment required intrinsics were allowed in unsafe rust. I’m not sure I understand this “direct to llvm dispatch” argument or how that isn’t accessible to stable Rust today.
      [-]
      - dzaima 5 minutes ago
        You can indeed use intrinsics to make a SIMD library in plain safe stable rust today to some extent; that just isn't what core::simd does; rather, on the Rust-side it's all target-agnostic and LLVM (or whatever other backend) handles deciding how to lower any given op to the target architecture.
        e.g. all core::simd addition ends up invoking the single function [1] which is then directly handled by rustc. But these architecture-agnostic intrinsics are unstable[2] (as they're only there as a building block for core::simd), and you can't manually use "#[rustc_intrinsic]" & co in stable rust either.
        [1]: https://github.com/rust-lang/rust/blob/b01cc1cf01ed12adb2595...
        [2]: https://github.com/rust-lang/rust/blob/b01cc1cf01ed12adb2595...
  - the__alchemist 1 hour ago
    Good question. Probably, but I don't know how and haven't tried.
IshKebab 1 hour ago
> Fortunately, this problem only exists on x86.
Also RISC-V, where you can't even probe for extension support in user space unfortunately.
[-]
- dzaima 52 minutes ago
  Linux of course does have an interface for RISC-V extension probing via hwprobe. And there's a C interface[1] for probing that's OS-agnostic (though it's rather new).
  [1]: https://github.com/riscv-non-isa/riscv-c-api-doc/blob/main/s...
- raphlinus 1 hour ago
  It's not strictly x86 either, the other case you care about is fp16 support on ARM. But it is included in the M1 target, so really only on other ARM.