Build Configuration

The right build configuration will maximize the performance of your Rust program without any changes to its code. But you should check your program’s performance after applying any of the following changes, because they can sometimes worsen performance.

Release Builds

The single most important Rust performance tip is simple but easy to overlook: make sure you are using a release build rather than a debug build when you want high performance. This is most often done by specifying the --release flag to Cargo.

A release build typically runs much faster than a debug build. 10-100x speedups over debug builds are common!

Debug builds are the default. They are produced if you run cargo build, cargo run, or rustc without any additional options. Debug builds are good for debugging, but are not optimized.

Consider the following final line of output from a cargo build run.

Finished dev [unoptimized + debuginfo] target(s) in 29.80s

The [unoptimized + debuginfo] indicates that a debug build has been produced. The compiled code will be placed in the target/debug/ directory. cargo run will run the debug build.

Release builds are more optimized than debug builds. They also omit some checks, such as debug assertions and integer overflow checks. Produce one with cargo build --release, cargo run --release, or rustc -O. (Alternatively, rustc has multiple other options for optimized builds, such as -C opt-level.) This will typically take longer than a debug build because of the additional optimizations.

Consider the following final line of output from a cargo build --release run.

Finished release [optimized] target(s) in 1m 01s

The [optimized] indicates that a release build has been produced. The compiled code will be placed in the target/release/ directory. cargo run --release will run the release build.

See the Cargo profile documentation for more details about the differences between debug builds (which use the dev profile) and release builds (which use the release profile).

Link-time optimization (LTO) is a whole-program optimization technique that can improve runtime performance by 10-20% or more, at the cost of increased build times. For any individual Rust program it is easy to see if the runtime versus compile-time trade-off is worthwhile.

The simplest way to try LTO is to add the following lines to the Cargo.toml file and do a release build.

[profile.release]
lto = true

This will result in “fat” LTO, which optimizes across all crates in the dependency graph.

Alternatively, use lto = "thin" in Cargo.toml to use “thin” LTO, which is a less aggressive form of LTO that often works as well as “fat” LTO without increasing build times as much.

See the Cargo LTO documentation for more details about the lto setting, and about enabling specific settings for different profiles.

Codegen Units

The Rust compiler splits your crate into multiple codegen units to parallelize (and thus speed up) compilation. However, this might cause it to miss some potential optimizations. If you want to potentially improve runtime performance at the cost of larger compile time, you can set the number of units to one:

[profile.release]
codegen-units = 1

Example.

Be wary that the codegen unit count is a heuristic and thus a smaller count can actually result in a slower program.

Using CPU Specific Instructions

If you do not care that much about the compatibility of your binary on older (or other types of) processors, you can tell the compiler to generate the newest (and potentially fastest) instructions specific to a certain CPU architecture.

For example, if you pass -C target-cpu=native to rustc, it will use the best instructions for your current CPU:

$ RUSTFLAGS="-C target-cpu=native" cargo build --release

This can have a large effect, especially if the compiler finds vectorization opportunities in your code.

As of July 2022, on M1 Macs there is an issue where using -C target-cpu=native doesn’t detect all the CPU features. You need to use -C target-cpu=apple-m1 instead.

If you are unsure whether -C target-cpu=native is working optimally, compare the output of rustc --print cfg and rustc --print cfg -C target-cpu=native to see if the CPU features are being detected correctly in the latter case. If not, you can use -C target-feature to target specific features.

Abort on panic!

If you do not need to catch or unwind panics, you can tell the compiler to simply abort on panics. This might reduce binary size and increase performance slightly:

[profile.release]
panic = "abort"

Profile-guided Optimization

Profile-guided optimization (PGO) is a compilation model where you compile your program, run it on sample data while collecting profiling data, and then use that profiling data to guide a second compilation of the program. Example.

It is an advanced technique that takes some effort to set up, but is worthwhile in some cases. See the rustc PGO documentation for details.