Heap Allocations

Heap allocations are moderately expensive. The exact details depend on which allocator is in use, but each allocation (and deallocation) typically involves acquiring a global lock, doing some non-trivial data structure manipulation, and possibly executing a system call. Small allocations are not necessarily cheaper than large allocations. It is worth understanding which Rust data structures and operations cause allocations, because avoiding them can greatly improve performance.

The Rust Container Cheat Sheet has visualizations of common Rust types, and is an excellent companion to the following sections.

Profiling

If a general-purpose profiler shows malloc, free, and related functions as hot, then it is likely worth trying to reduce the allocation rate and/or using an alternative allocator.

DHAT is an excellent profiler to use when reducing allocation rates. It works on Linux and some other Unixes. It precisely identifies hot allocation sites and their allocation rates. Exact results will vary, but experience with rustc has shown that reducing allocation rates by 10 allocations per million instructions executed can have measurable performance improvements (e.g. ~1%).

Here is some example output from DHAT.

AP 1.1/25 (2 children) {
  Total:     54,533,440 bytes (4.02%, 2,714.28/Minstr) in 458,839 blocks (7.72%, 22.84/Minstr), avg size 118.85 bytes, avg lifetime 1,127,259,403.64 instrs (5.61% of program duration)
  At t-gmax: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
  At t-end:  0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
  Reads:     15,993,012 bytes (0.29%, 796.02/Minstr), 0.29/byte
  Writes:    20,974,752 bytes (1.03%, 1,043.97/Minstr), 0.38/byte
  Allocated at {
    #1: 0x95CACC9: alloc (alloc.rs:72)
    #2: 0x95CACC9: alloc (alloc.rs:148)
    #3: 0x95CACC9: reserve_internal<syntax::tokenstream::TokenStream,alloc::alloc::Global> (raw_vec.rs:669)
    #4: 0x95CACC9: reserve<syntax::tokenstream::TokenStream,alloc::alloc::Global> (raw_vec.rs:492)
    #5: 0x95CACC9: reserve<syntax::tokenstream::TokenStream> (vec.rs:460)
    #6: 0x95CACC9: push<syntax::tokenstream::TokenStream> (vec.rs:989)
    #7: 0x95CACC9: parse_token_trees_until_close_delim (tokentrees.rs:27)
    #8: 0x95CACC9: syntax::parse::lexer::tokentrees::<impl syntax::parse::lexer::StringReader<'a>>::parse_token_tree (tokentrees.rs:81)
  }
}

It is beyond the scope of this book to describe everything in this example, but it should be clear that DHAT gives a wealth of information about allocations, such as where and how often they happen, how big they are, how long they live for, and how often they are accessed.

Box

Box is the simplest heap-allocated type. A Box<T> value is a T value that is allocated on the heap.

It is sometimes worth boxing one or more fields in a struct or enum fields to make a type smaller. (See the Type Sizes chapter for more about this.)

Other than that, Box is straightforward and does not offer much scope for optimizations.

Rc/Arc

Rc/Arc are similar to Box, but the value on the heap is accompanied by two reference counts. They allow value sharing, which can be an effective way to reduce memory usage.

However, if used for values that are rarely shared, they can increase allocation rates by heap allocating values that might otherwise not be heap-allocated. Example.

Unlike Box, calling clone on an Rc/Arc value does not involve an allocation. Instead, it merely increments a reference count.

Vec

Vec is a heap-allocated type with a great deal of scope for optimizing the number of allocations, and/or minimizing the amount of wasted space. To do this requires understanding how its elements are stored.

A Vec contains three words: a length, a capacity, and a pointer. The pointer will point to heap-allocated memory if the capacity is nonzero and the element size is nonzero; otherwise, it will not point to allocated memory.

Even if the Vec itself is not heap-allocated, the elements (if present and nonzero-sized) always will be. If nonzero-sized elements are present, the memory holding those elements may be larger than necessary, providing space for additional future elements. The number of elements present is the length, and the number of elements that could be held without reallocating is the capacity.

When the vector needs to grow beyond its current capacity, the elements will be copied into a larger heap allocation, and the old heap allocation will be freed.

Vec Growth

A new, empty Vec created by the common means (vec![] or Vec::new or Vec::default) has a length and capacity of zero, and no heap allocation is required. If you repeatedly push individual elements onto the end of the Vec, it will periodically reallocate. The growth strategy is not specified, but at the time of writing it uses a quasi-doubling strategy resulting in the following capacities: 0, 4, 8, 16, 32, 64, and so on. (It skips directly from 0 to 4, instead of going via 1 and 2, because this avoids many allocations in practice.) As a vector grows, the frequency of reallocations will decrease exponentially, but the amount of possibly-wasted excess capacity will increase exponentially.

This growth strategy is typical for growable data structures and reasonable in the general case, but if you know in advance the likely length of a vector you can often do better. If you have a hot vector allocation site (e.g. a hot Vec::push call), it is worth using eprintln! to print the vector length at that site and then doing some post-processing (e.g. with counts) to determine the length distribution. For example, you might have many short vectors, or you might have a smaller number of very long vectors, and the best way to optimize the allocation site will vary accordingly.

Short Vecs

If you have many short vectors, you can use the SmallVec type from the smallvec crate. SmallVec<[T; N]> is a drop-in replacement for Vec that can store N elements within the SmallVec itself, and then switches to a heap allocation if the number of elements exceeds that. (Note also that vec![] literals must be replaced with smallvec![] literals.) Example 1, Example 2.

SmallVec reliably reduces the allocation rate when used appropriately, but its use does not guarantee improved performance. It is slightly slower than Vec for normal operations because it must always check if the elements are heap-allocated or not. Also, If N is high or T is large, then the SmallVec<[T; N]> itself can be larger than Vec<T>, and copying of SmallVec values will be slower. As always, benchmarking is required to confirm that an optimization is effective.

If you have many short vectors and you precisely know their maximum length, ArrayVec from the arrayvec crate is a better choice than SmallVec. It does not require the fallback to heap allocation, which makes it a little faster. Example.

Longer Vecs

If you know the minimum or exact size of a vector, you can reserve a specific capacity with Vec::with_capacity, Vec::reserve, or Vec::reserve_exact. For example, if you know a vector will grow to have at least 20 elements, these functions can immediately provide a vector with a capacity of at least 20 using a single allocation, whereas pushing the items one at a time would result in four allocations (for capacities of 4, 8, 16, and 32). Example.

If you know the maximum length of a vector, the above functions also let you not allocate excess space unnecessarily. Similarly, Vec::shrink_to_fit can be used to minimize wasted space, but note that it may cause a reallocation.

String

A String contains heap-allocated bytes. The representation and operation of String are very similar to that of Vec<u8>. Many Vec methods relating to growth and capacity have equivalents for String, such as String::with_capacity.

The SmallString type from the smallstr crate is similar to the SmallVec type.

The String type from the smartstring crate is a drop-in replacement for String that avoids heap allocations for strings with less than three words’ worth of characters. On 64-bit platforms, this is any string that is less than 24 bytes, which includes all strings containing 23 or fewer ASCII characters. Example.

Note that the format! macro produces a String, which means it performs an allocation. If you can avoid a format! call by using a string literal, that will avoid this allocation. Example. std::format_args and/or the lazy_format crate may help with this.

Hash Tables

HashSet and HashMap are hash tables. Their representation and operations are similar to those of Vec, in terms of allocations: they have a single contiguous heap allocation, holding keys and values, which is reallocated as necessary as the table grows. Many Vec methods relating to growth and capacity have equivalents for HashSet/HashMap, such as HashSet::with_capacity.

clone

Calling clone on a value that contains heap-allocated memory typically involves additional allocations. For example, calling clone on a non-empty Vec requires a new allocation for the elements (but note that the capacity of the new Vec might not be the same as the capacity of the original Vec). The exception is Rc/Arc, where a clone call just increments the reference count.

clone_from is an alternative to clone. a.clone_from(&b) is equivalent to a = b.clone() but may avoid unnecessary allocations. For example, if you want to clone one Vec over the top of an existing Vec, the existing Vec’s heap allocation will be reused if possible, as the following example shows.

#![allow(unused)]
fn main() {
let mut v1: Vec<u32> = Vec::with_capacity(99);
let v2: Vec<u32> = vec![1, 2, 3];
v1.clone_from(&v2); // v1's allocation is reused
assert_eq!(v1.capacity(), 99);
}

Although clone usually causes allocations, it is a reasonable thing to use in many circumstances and can often make code simpler. Use profiling data to see which clone calls are hot and worth taking the effort to avoid.

Sometimes Rust code ends up containing unnecessary clone calls, due to (a) programmer error, or (b) changes in the code that render previously-necessary clone calls unnecessary. If you see a hot clone call that does not seem necessary, sometimes it can simply be removed. Example 1, Example 2, Example 3.

to_owned

ToOwned::to_owned is implemented for many common types. It creates owned data from borrowed data, usually by cloning, and therefore often causes heap allocations. For example, it can be used to create a String from a &str.

Sometimes to_owned calls (and related calls such as clone and to_string) can be avoided by storing a reference to borrowed data in a struct rather than an owned copy. This requires lifetime annotations on the struct, complicating the code, and should only be done when profiling and benchmarking shows that it is worthwhile. Example.

Cow

Sometimes code deals with a mixture of borrowed and owned data. Imagine a vector of error messages, some of which are static string literals and some of which are constructed with format!. The obvious representation is Vec<String>, as the following example shows.

#![allow(unused)]
fn main() {
let mut errors: Vec<String> = vec![];
errors.push("something went wrong".to_string());
errors.push(format!("something went wrong on line {}", 100));
}

That requires a to_string call to promote the static string literal to a String, which incurs an allocation.

Instead you can use the Cow type, which can hold either borrowed or owned data. A borrowed value x is wrapped with Cow::Borrowed(x), and an owned value y is wrapped with Cow::Owned(y). Cow also implements the From<T> trait for various string, slice, and path types, so you can usually use into as well. (Or Cow::from, which is longer but results in more readable code, because it makes the type clearer.) The following example puts all this together.

#![allow(unused)]
fn main() {
use std::borrow::Cow;
let mut errors: Vec<Cow<'static, str>> = vec![];
errors.push(Cow::Borrowed("something went wrong"));
errors.push(Cow::Owned(format!("something went wrong on line {}", 100)));
errors.push(Cow::from("something else went wrong"));
errors.push(format!("something else went wrong on line {}", 101).into());
}

errors now holds a mixture of borrowed and owned data without requiring any extra allocations. This example involves &str/String, but other pairings such as &[T]/Vec<T> and &Path/PathBuf are also possible.

Example 1, Example 2.

All of the above applies if the data is immutable. But Cow also allows borrowed data to be promoted to owned data if it needs to be mutated. Cow::to_mut will obtain a mutable reference to an owned value, cloning if necessary. This is called “clone-on-write”, which is where the name Cow comes from.

This clone-on-write behaviour is useful when you have some borrowed data, such as a &str, that is mostly read-only but occasionally needs to be modified.

Example 1, Example 2.

Finally, because Cow implements Deref, you can call methods directly on the data it encloses.

Cow can be fiddly to get working, but it is often worth the effort.

Reusing Collections

Sometimes you need to build up a collection such as a Vec in stages. It is usually better to do this by modifying a single Vec than by building multiple Vecs and then combining them.

For example, if you have a function do_stuff that produces a Vec that might be called multiple times:

#![allow(unused)]
fn main() {
fn do_stuff(x: u32, y: u32) -> Vec<u32> {
    vec![x, y]
}
}

It might be better to instead modify a passed-in Vec:

#![allow(unused)]
fn main() {
fn do_stuff(x: u32, y: u32, vec: &mut Vec<u32>) {
    vec.push(x);
    vec.push(y);
}
}

Sometimes it is worth keeping around a “workhorse” collection that can be reused. For example, if a Vec is needed for each iteration of a loop, you could declare the Vec outside the loop, use it within the loop body, and then call clear at the end of the loop body (to empty the Vec without affecting its capacity). This avoids allocations at the cost of obscuring the fact that each iteration’s usage of the Vec is unrelated to the others. Example 1, Example 2.

Similarly, it is sometimes worth keeping a workhorse collection within a struct, to be reused in one or more methods that are called repeatedly.

Reading Lines from a File

BufRead::lines makes it easy to read a file one line at a time:

#![allow(unused)]
fn main() {
fn blah() -> Result<(), std::io::Error> {
fn process(_: &str) {}
use std::io::{self, BufRead};
let mut lock = io::stdin().lock();
for line in lock.lines() {
    process(&line?);
}
Ok(())
}
}

But the iterator it produces returns io::Result<String>, which means it allocates for every line in the file.

An alternative is to use a workhorse String in a loop over BufRead::read_line:

#![allow(unused)]
fn main() {
fn blah() -> Result<(), std::io::Error> {
fn process(_: &str) {}
use std::io::{self, BufRead};
let mut lock = io::stdin().lock();
let mut line = String::new();
while lock.read_line(&mut line)? != 0 {
    process(&line);
    line.clear();
}
Ok(())
}
}

This reduces the number of allocations to at most a handful, and possibly just one. (The exact number depends on how many times line needs to be reallocated, which depends on the distribution of line lengths in the file.)

This will only work if the loop body can operate on a &str, rather than a String.

Example.

Using an Alternative Allocator

It is also possible to improve heap allocation performance without changing your code, simply by using a different allocator. See the Alternative Allocators section for details.

Avoiding Regressions

To ensure the number and/or size of allocations done by your code doesn’t increase unintentionally, you can use the heap usage testing feature of dhat-rs to write tests that check particular code snippets allocate the expected amount of heap memory.