General Tips
The previous sections of this book have discussed Rust-specific techniques. This section gives a brief overview of some general performance principles.
As long as the obvious pitfalls are avoided (e.g. using non-release builds), Rust code generally is fast and uses little memory. Especially if you are used to dynamically-typed languages such as Python and Ruby, or statically-types languages with a garbage collector such as Java and C#.
Optimized code is often more complex and takes more effort to write than unoptimized code. For this reason, it is only worth optimizing hot code.
The biggest performance improvements often come from changes to algorithms or data structures, rather than low-level optimizations. Example 1, Example 2.
Writing code that works well with modern hardware is not always easy, but worth striving for. For example, try to minimize cache misses and branch mispredictions, where possible.
Most optimizations result in small speedups. Although no single small speedup is noticeable, they really add up if you can do enough of them.
Different profilers have different strengths. It is good to use more than one.
When profiling indicates that a function is hot, there are two common ways to speed things up: (a) make the function faster, and/or (b) avoid calling it as much.
It is often easier to eliminate silly slowdowns than it is to introduce clever speedups.
Avoid computing things unless necessary. Lazy/on-demand computations are often a win. Example 1, Example 2.
Complex general cases can often be avoided by optimistically checking for common special cases that are simpler. Example 1, Example 2, Example 3. In particular, specially handling collections with 0, 1, or 2 elements is often a win when small sizes dominate. Example 1, Example 2, Example 3, Example 4.
Similarly, when dealing with repetitive data, it is often possible to use a simple form of data compression, by using a compact representation for common values and then having a fallback to a secondary table for unusual values. Example 1, Example 2, Example 3.
When code deals with multiple cases, measure case frequencies and handle the most common ones first.
When dealing with lookups that involve high locality, it can be a win to put a small cache in front of a data structure.
Optimized code often has a non-obvious structure, which means that explanatory comments are valuable, particularly those that reference profiling measurements. A comment like “99% of the time this vector has 0 or 1 elements, so handle those cases first” can be illuminating.