Type Sizes
Shrinking oft-instantiated types can help performance.
For example, if memory usage is high, a heap profiler like DHAT can identify the hot allocation points and the types involved. Shrinking these types can reduce peak memory usage, and possibly improve performance by reducing memory traffic and cache pressure.
Furthermore, Rust types that are larger than 128 bytes are copied with memcpy
rather than inline code. If memcpy
shows up in non-trivial amounts in
profiles, DHAT’s “copy profiling” mode will tell you exactly where the hot
memcpy
calls are and the types involved. Shrinking these types to 128 bytes
or less can make the code faster by avoiding memcpy
calls and reducing memory
traffic.
Measuring Type Sizes
std::mem::size_of
gives the size of a type, in bytes, but often you want to
know the exact layout as well. For example, an enum might be surprisingly large
due to a single outsized variant.
The -Zprint-type-sizes
option does exactly this. It isn’t enabled on release
versions of rustc, so you’ll need to use a nightly version of rustc. Here is
one possible invocation via Cargo:
RUSTFLAGS=-Zprint-type-sizes cargo +nightly build --release
And here is a possible invocation of rustc:
rustc +nightly -Zprint-type-sizes input.rs
It will print out details of the size, layout, and alignment of all types in use. For example, for this type:
#![allow(unused)] fn main() { enum E { A, B(i32), C(u64, u8, u64, u8), D(Vec<u32>), } }
it prints the following, plus information about a few built-in types.
print-type-size type: `E`: 32 bytes, alignment: 8 bytes
print-type-size discriminant: 1 bytes
print-type-size variant `D`: 31 bytes
print-type-size padding: 7 bytes
print-type-size field `.0`: 24 bytes, alignment: 8 bytes
print-type-size variant `C`: 23 bytes
print-type-size field `.1`: 1 bytes
print-type-size field `.3`: 1 bytes
print-type-size padding: 5 bytes
print-type-size field `.0`: 8 bytes, alignment: 8 bytes
print-type-size field `.2`: 8 bytes
print-type-size variant `B`: 7 bytes
print-type-size padding: 3 bytes
print-type-size field `.0`: 4 bytes, alignment: 4 bytes
print-type-size variant `A`: 0 bytes
The output shows the following.
- The size and alignment of the type.
- For enums, the size of the discriminant.
- For enums, the size of each variant (sorted from largest to smallest).
- The size, alignment, and ordering of all fields. (Note that the compiler has
reordered variant
C
’s fields to minimize the size ofE
.) - The size and location of all padding.
Alternatively, the top-type-sizes crate can be used to display the output in a more compact form.
Once you know the layout of a hot type, there are multiple ways to shrink it.
Field Ordering
The Rust compiler automatically sorts the fields in struct and enums to
minimize their sizes (unless the #[repr(C)]
attribute is specified), so you
do not have to worry about field ordering. But there are other ways to minimize
the size of hot types.
Smaller Enums
If an enum has an outsized variant, consider boxing one or more fields. For example, you could change this type:
#![allow(unused)] fn main() { type LargeType = [u8; 100]; enum A { X, Y(i32), Z(i32, LargeType), } }
to this:
#![allow(unused)] fn main() { type LargeType = [u8; 100]; enum A { X, Y(i32), Z(Box<(i32, LargeType)>), } }
This reduces the type size at the cost of requiring an extra heap allocation
for the A::Z
variant. This is more likely to be a net performance win if the
A::Z
variant is relatively rare. The Box
will also make A::Z
slightly
less ergonomic to use, especially in match
patterns.
Example 1,
Example 2,
Example 3,
Example 4,
Example 5,
Example 6.
Smaller Integers
It is often possible to shrink types by using smaller integer types. For
example, while it is most natural to use usize
for indices, it is often
reasonable to stores indices as u32
, u16
, or even u8
, and then coerce to
usize
at use points.
Example 1,
Example 2.
Boxed Slices
Rust vectors contain three words: a length, a capacity, and a pointer. If you
have a vector that is unlikely to be changed in the future, you can convert it
to a boxed slice with Vec::into_boxed_slice
. A boxed slice contains only
two words, a length and a pointer. Any excess element capacity is dropped,
which may cause a reallocation.
#![allow(unused)] fn main() { use std::mem::{size_of, size_of_val}; let v: Vec<u32> = vec![1, 2, 3]; assert_eq!(size_of_val(&v), 3 * size_of::<usize>()); let bs: Box<[u32]> = v.into_boxed_slice(); assert_eq!(size_of_val(&bs), 2 * size_of::<usize>()); }
The boxed slice can be converted back to a vector with slice::into_vec
without any cloning or a reallocation.
ThinVec
An alternative to boxed slices is ThinVec
, from the thin_vec
crate. It is
functionally equivalent to Vec
, but stores the length and capacity in the
same allocation as the elements (if there are any). This means that
size_of::<ThinVec<T>>
is only one word.
ThinVec
is a good choice within oft-instantiated types for vectors that are
often empty. It can also be used to shrink the largest variant of an enum, if
that variant contains a Vec
.
Avoiding Regressions
If a type is hot enough that its size can affect performance, it is a good idea
to use a static assertion to ensure that it does not accidentally regress. The
following example uses a macro from the static_assertions
crate.
// This type is used a lot. Make sure it doesn't unintentionally get bigger.
#[cfg(target_arch = "x86_64")]
static_assertions::assert_eq_size!(HotType, [u8; 64]);
The cfg
attribute is important, because type sizes can vary on different
platforms. Restricting the assertion to x86_64
(which is typically the most
widely-used platform) is likely to be good enough to prevent regressions in
practice.