Heap Allocations
Heap allocations are moderately expensive. The exact details depend on which allocator is in use, but each allocation (and deallocation) typically involves acquiring a global lock, doing some non-trivial data structure manipulation, and possibly executing a system call. Small allocations are not necessarily cheaper than large allocations. It is worth understanding which Rust data structures and operations cause allocations, because avoiding them can greatly improve performance.
The Rust Container Cheat Sheet has visualizations of common Rust types, and is an excellent companion to the following sections.
Profiling
If a general-purpose profiler shows malloc
, free
, and related functions as
hot, then it is likely worth trying to reduce the allocation rate and/or using
an alternative allocator.
DHAT is an excellent profiler to use when reducing allocation rates. It works on Linux and some other Unixes. It precisely identifies hot allocation sites and their allocation rates. Exact results will vary, but experience with rustc has shown that reducing allocation rates by 10 allocations per million instructions executed can have measurable performance improvements (e.g. ~1%).
Here is some example output from DHAT.
AP 1.1/25 (2 children) {
Total: 54,533,440 bytes (4.02%, 2,714.28/Minstr) in 458,839 blocks (7.72%, 22.84/Minstr), avg size 118.85 bytes, avg lifetime 1,127,259,403.64 instrs (5.61% of program duration)
At t-gmax: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
At t-end: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
Reads: 15,993,012 bytes (0.29%, 796.02/Minstr), 0.29/byte
Writes: 20,974,752 bytes (1.03%, 1,043.97/Minstr), 0.38/byte
Allocated at {
#1: 0x95CACC9: alloc (alloc.rs:72)
#2: 0x95CACC9: alloc (alloc.rs:148)
#3: 0x95CACC9: reserve_internal<syntax::tokenstream::TokenStream,alloc::alloc::Global> (raw_vec.rs:669)
#4: 0x95CACC9: reserve<syntax::tokenstream::TokenStream,alloc::alloc::Global> (raw_vec.rs:492)
#5: 0x95CACC9: reserve<syntax::tokenstream::TokenStream> (vec.rs:460)
#6: 0x95CACC9: push<syntax::tokenstream::TokenStream> (vec.rs:989)
#7: 0x95CACC9: parse_token_trees_until_close_delim (tokentrees.rs:27)
#8: 0x95CACC9: syntax::parse::lexer::tokentrees::<impl syntax::parse::lexer::StringReader<'a>>::parse_token_tree (tokentrees.rs:81)
}
}
It is beyond the scope of this book to describe everything in this example, but it should be clear that DHAT gives a wealth of information about allocations, such as where and how often they happen, how big they are, how long they live for, and how often they are accessed.
Box
Box
is the simplest heap-allocated type. A Box<T>
value is a T
value
that is allocated on the heap.
It is sometimes worth boxing one or more fields in a struct or enum fields to make a type smaller. (See the Type Sizes chapter for more about this.)
Other than that, Box
is straightforward and does not offer much scope for
optimizations.
Rc
/Arc
Rc
/Arc
are similar to Box
, but the value on the heap is accompanied by
two reference counts. They allow value sharing, which can be an effective way
to reduce memory usage.
However, if used for values that are rarely shared, they can increase allocation rates by heap allocating values that might otherwise not be heap-allocated. Example.
Unlike Box
, calling clone
on an Rc
/Arc
value does not involve an
allocation. Instead, it merely increments a reference count.
Vec
Vec
is a heap-allocated type with a great deal of scope for optimizing the
number of allocations, and/or minimizing the amount of wasted space. To do this
requires understanding how its elements are stored.
A Vec
contains three words: a length, a capacity, and a pointer. The pointer
will point to heap-allocated memory if the capacity is nonzero and the element
size is nonzero; otherwise, it will not point to allocated memory.
Even if the Vec
itself is not heap-allocated, the elements (if present and
nonzero-sized) always will be. If nonzero-sized elements are present, the
memory holding those elements may be larger than necessary, providing space for
additional future elements. The number of elements present is the length, and
the number of elements that could be held without reallocating is the capacity.
When the vector needs to grow beyond its current capacity, the elements will be copied into a larger heap allocation, and the old heap allocation will be freed.
Vec
Growth
A new, empty Vec
created by the common means
(vec![]
or Vec::new
or Vec::default
) has a length and capacity of zero, and no
heap allocation is required. If you repeatedly push individual elements onto
the end of the Vec
, it will periodically reallocate. The growth strategy is
not specified, but at the time of writing it uses a quasi-doubling strategy
resulting in the following capacities: 0, 4, 8, 16, 32, 64, and so on. (It
skips directly from 0 to 4, instead of going via 1 and 2, because this avoids
many allocations in practice.) As a vector grows, the frequency of
reallocations will decrease exponentially, but the amount of possibly-wasted
excess capacity will increase exponentially.
This growth strategy is typical for growable data structures and reasonable in
the general case, but if you know in advance the likely length of a vector you
can often do better. If you have a hot vector allocation site (e.g. a hot
Vec::push
call), it is worth using eprintln!
to print the vector length
at that site and then doing some post-processing (e.g. with counts
) to
determine the length distribution. For example, you might have many short
vectors, or you might have a smaller number of very long vectors, and the best
way to optimize the allocation site will vary accordingly.
Short Vec
s
If you have many short vectors, you can use the SmallVec
type from the
smallvec
crate. SmallVec<[T; N]>
is a drop-in replacement for Vec
that
can store N
elements within the SmallVec
itself, and then switches to a
heap allocation if the number of elements exceeds that. (Note also that
vec![]
literals must be replaced with smallvec![]
literals.)
Example 1,
Example 2.
SmallVec
reliably reduces the allocation rate when used appropriately, but
its use does not guarantee improved performance. It is slightly slower than
Vec
for normal operations because it must always check if the elements are
heap-allocated or not. Also, If N
is high or T
is large, then the
SmallVec<[T; N]>
itself can be larger than Vec<T>
, and copying of
SmallVec
values will be slower. As always, benchmarking is required to
confirm that an optimization is effective.
If you have many short vectors and you precisely know their maximum length,
ArrayVec
from the arrayvec
crate is a better choice than SmallVec
. It
does not require the fallback to heap allocation, which makes it a little
faster.
Example.
Longer Vec
s
If you know the minimum or exact size of a vector, you can reserve a specific
capacity with Vec::with_capacity
, Vec::reserve
, or
Vec::reserve_exact
. For example, if you know a vector will grow to have at
least 20 elements, these functions can immediately provide a vector with a
capacity of at least 20 using a single allocation, whereas pushing the items
one at a time would result in four allocations (for capacities of 4, 8, 16, and
32).
Example.
If you know the maximum length of a vector, the above functions also let you
not allocate excess space unnecessarily. Similarly, Vec::shrink_to_fit
can be
used to minimize wasted space, but note that it may cause a reallocation.
String
A String
contains heap-allocated bytes. The representation and operation of
String
are very similar to that of Vec<u8>
. Many Vec
methods relating to
growth and capacity have equivalents for String
, such as
String::with_capacity
.
The SmallString
type from the smallstr
crate is similar to the SmallVec
type.
The String
type from the smartstring
crate is a drop-in replacement for
String
that avoids heap allocations for strings with less than three words’
worth of characters. On 64-bit platforms, this is any string that is less than
24 bytes, which includes all strings containing 23 or fewer ASCII characters.
Example.
Note that the format!
macro produces a String
, which means it performs an
allocation. If you can avoid a format!
call by using a string literal, that
will avoid this allocation.
Example.
std::format_args
and/or the lazy_format
crate may help with this.
Hash Tables
HashSet
and HashMap
are hash tables. Their representation and
operations are similar to those of Vec
, in terms of allocations: they have
a single contiguous heap allocation, holding keys and values, which is
reallocated as necessary as the table grows. Many Vec
methods relating to
growth and capacity have equivalents for HashSet
/HashMap
, such as
HashSet::with_capacity
.
clone
Calling clone
on a value that contains heap-allocated memory typically
involves additional allocations. For example, calling clone
on a non-empty
Vec
requires a new allocation for the elements (but note that the capacity of
the new Vec
might not be the same as the capacity of the original Vec
). The
exception is Rc
/Arc
, where a clone
call just increments the reference
count.
clone_from
is an alternative to clone
. a.clone_from(&b)
is equivalent
to a = b.clone()
but may avoid unnecessary allocations. For example, if you
want to clone one Vec
over the top of an existing Vec
, the existing Vec
’s
heap allocation will be reused if possible, as the following example shows.
#![allow(unused)] fn main() { let mut v1: Vec<u32> = Vec::with_capacity(99); let v2: Vec<u32> = vec![1, 2, 3]; v1.clone_from(&v2); // v1's allocation is reused assert_eq!(v1.capacity(), 99); }
Although clone
usually causes allocations, it is a reasonable thing to use in
many circumstances and can often make code simpler. Use profiling data to see
which clone
calls are hot and worth taking the effort to avoid.
Sometimes Rust code ends up containing unnecessary clone
calls, due to (a)
programmer error, or (b) changes in the code that render previously-necessary
clone
calls unnecessary. If you see a hot clone
call that does not seem
necessary, sometimes it can simply be removed.
Example 1,
Example 2,
Example 3.
to_owned
ToOwned::to_owned
is implemented for many common types. It creates owned
data from borrowed data, usually by cloning, and therefore often causes heap
allocations. For example, it can be used to create a String
from a &str
.
Sometimes to_owned
calls (and related calls such as clone
and to_string
)
can be avoided by storing a reference to borrowed data in a struct rather than
an owned copy. This requires lifetime annotations on the struct, complicating
the code, and should only be done when profiling and benchmarking shows that it
is worthwhile.
Example.
Cow
Sometimes code deals with a mixture of borrowed and owned data. Imagine a
vector of error messages, some of which are static string literals and some of
which are constructed with format!
. The obvious representation is
Vec<String>
, as the following example shows.
#![allow(unused)] fn main() { let mut errors: Vec<String> = vec![]; errors.push("something went wrong".to_string()); errors.push(format!("something went wrong on line {}", 100)); }
That requires a to_string
call to promote the static string literal to a
String
, which incurs an allocation.
Instead you can use the Cow
type, which can hold either borrowed or owned
data. A borrowed value x
is wrapped with Cow::Borrowed(x)
, and an owned
value y
is wrapped with Cow::Owned(y)
. Cow
also implements the From<T>
trait for various string, slice, and path types, so you can usually use into
as well. (Or Cow::from
, which is longer but results in more readable code,
because it makes the type clearer.) The following example puts all this together.
#![allow(unused)] fn main() { use std::borrow::Cow; let mut errors: Vec<Cow<'static, str>> = vec![]; errors.push(Cow::Borrowed("something went wrong")); errors.push(Cow::Owned(format!("something went wrong on line {}", 100))); errors.push(Cow::from("something else went wrong")); errors.push(format!("something else went wrong on line {}", 101).into()); }
errors
now holds a mixture of borrowed and owned data without requiring any
extra allocations. This example involves &str
/String
, but other pairings
such as &[T]
/Vec<T>
and &Path
/PathBuf
are also possible.
All of the above applies if the data is immutable. But Cow
also allows
borrowed data to be promoted to owned data if it needs to be mutated.
Cow::to_mut
will obtain a mutable reference to an owned value, cloning if
necessary. This is called “clone-on-write”, which is where the name Cow
comes
from.
This clone-on-write behaviour is useful when you have some borrowed data, such
as a &str
, that is mostly read-only but occasionally needs to be modified.
Finally, because Cow
implements Deref
, you can call methods directly on
the data it encloses.
Cow
can be fiddly to get working, but it is often worth the effort.
Reusing Collections
Sometimes you need to build up a collection such as a Vec
in stages. It is
usually better to do this by modifying a single Vec
than by building multiple
Vec
s and then combining them.
For example, if you have a function do_stuff
that produces a Vec
that might
be called multiple times:
#![allow(unused)] fn main() { fn do_stuff(x: u32, y: u32) -> Vec<u32> { vec![x, y] } }
It might be better to instead modify a passed-in Vec
:
#![allow(unused)] fn main() { fn do_stuff(x: u32, y: u32, vec: &mut Vec<u32>) { vec.push(x); vec.push(y); } }
Sometimes it is worth keeping around a “workhorse” collection that can be
reused. For example, if a Vec
is needed for each iteration of a loop, you
could declare the Vec
outside the loop, use it within the loop body, and then
call clear
at the end of the loop body (to empty the Vec
without affecting
its capacity). This avoids allocations at the cost of obscuring the fact that
each iteration’s usage of the Vec
is unrelated to the others.
Example 1,
Example 2.
Similarly, it is sometimes worth keeping a workhorse collection within a struct, to be reused in one or more methods that are called repeatedly.
Reading Lines from a File
BufRead::lines
makes it easy to read a file one line at a time:
#![allow(unused)] fn main() { fn blah() -> Result<(), std::io::Error> { fn process(_: &str) {} use std::io::{self, BufRead}; let mut lock = io::stdin().lock(); for line in lock.lines() { process(&line?); } Ok(()) } }
But the iterator it produces returns io::Result<String>
, which means it
allocates for every line in the file.
An alternative is to use a workhorse String
in a loop over
BufRead::read_line
:
#![allow(unused)] fn main() { fn blah() -> Result<(), std::io::Error> { fn process(_: &str) {} use std::io::{self, BufRead}; let mut lock = io::stdin().lock(); let mut line = String::new(); while lock.read_line(&mut line)? != 0 { process(&line); line.clear(); } Ok(()) } }
This reduces the number of allocations to at most a handful, and possibly just
one. (The exact number depends on how many times line
needs to be
reallocated, which depends on the distribution of line lengths in the file.)
This will only work if the loop body can operate on a &str
, rather than a
String
.
Using an Alternative Allocator
It is also possible to improve heap allocation performance without changing your code, simply by using a different allocator. See the Alternative Allocators section for details.
Avoiding Regressions
To ensure the number and/or size of allocations done by your code doesn’t increase unintentionally, you can use the heap usage testing feature of dhat-rs to write tests that check particular code snippets allocate the expected amount of heap memory.