How to speed up the Rust compiler in March 2024
It has been over six months since my last update on the Rust compiler’s performance. Time for an update.
Big wins
Let’s start with some big improvements. This list isn’t comprehensive, it’s just some things I noticed over this time period. The information about metrics at the top of this post still applies.
#115554: There are many build
configuration choices that can affect the performance of built Rust binaries.
One choice is to build with a single codegen
unit,
which increases build times but can improve runtime speed and binary size. In
this PR Jakub Beránek made the Rust compiler
itself be built with a single codegen unit on Linux. This gave a mean
wall-time reduction of 1.57% across all benchmark
results,
a mean max-rss reduction of 1.96% across all
results,
and also reduced the size of the rustc
binary. This change has not yet been
done for Windows or Mac builds because the improvements were smaller, but it
may happen soon.
#117727: In this PR, Ben
Kimock made all Debug::fmt
methods generated
via #[derive(Debug)]
be marked with #[inline]
. This was a small,
innocuous-sounding change that gave amazing results: a mean wall-time
reduction of 1.33% across all benchmark
results.
and a mean binary size reduction of 1.32% across all release build
results.
#119977: In this PR, Mark Rousskov introduced a cache that helped avoid many hash table lookups within the compiler. This gave a mean wall-time reduction of 1.20% across all benchmark results. The idea for this first arose 6.5 years ago!
#120055: In this PR, Nikita Popov upgraded the LLVM version used by the compiler to LLVM 18. This gave a mean wall-time reduction of 0.87% across all benchmark results. This is the latest in a long run of LLVM updates that have made rustc faster. Fantastic work from the LLVM folks!
In other big news, the Cranelift codegen backend is now available for general use on x86-64/Linux and ARM/Linux. It is an alternative to the standard LLVM codegen backend used by rustc, and is designed to reduce compile times at the cost of lower generated code quality. Give it a try for your debug builds! This is the culmination of a lot of work by bjorn3.
Finally, Jakub greatly reduced the size of compiled binaries by excluding debug info by default. For small programs this can reduce their size on disk by up to 10x!
My (lack of) improvements
For the first time ever, I’m writing one of these posts without having made any improvements to compile speed myself. I have always used a profile-driven optimization strategy, and the profiles you get when you measure rustc these days are incredibly flat. It’s hard to find improvements when the hottest functions only account for 1% or 2% of execution time. Because of this I have been working on things unrelated to compile speed.
That doesn’t mean there are no speed improvements left to be made, as the previous section shows. But they are much harder to find, and often require domain-specific insights that are hard to get when fishing around with a general-purpose profiler. And there is always other useful work to be done.
General Progress
For the period 2023-08-23 to 2024-03-04 we had some excellent overall performance results.
First, wall-time:
- There were 526 results measured across 43 benchmarks.
- 437 of these were improvements, and 89 were regressions. The mean change was a reduction of 7.13%, and plenty of the reductions were in the double digits. (In my last post the equivalent reduction was also 7.13%. Quite the coincidence!)
Next, peak memory usage:
- Again, there were 526 results measured across 43 benchmarks.
- 367 of these were improvements, and 159 were regressions. The mean change was a 2.05% reduction, and most of the changes were in the single digits.
Finally, binary size:
- There were 324 results measured across 43 benchmarks.
- 318 of these were improvements, and 6 were regressions. The mean change was a 28.03% reduction, and almost every result was a double-digit reduction.
- If we restrict things to non-incremental release
builds,
which is the most interesting case for binary size, there were 42
improvements, 1 regression, and the mean change was a reduction of 37.08%.
The
helloworld
benchmark saw a whopping 91.05% reduction. - These improvements are mostly to the omission of debug info mentioned above, plus some metadata improvements made by Mark.
For all three metrics, all but a handful of results met the significance threshold. I haven’t bothered separating those results because they made little difference to the headline numbers. As always, these measurements are done on Linux.
Finally, Jakub recently observed that compile times (as measured on Linux by the benchmark suite) dropped by 15% between February 2023 and February 2024. The corresponding reductions over each of the preceding three years were 7%, 17%, and 13%, and the reduction over the whole four year period was 37%. There is something to be said for steady, continuous improvements over long periods of time.