Hashing
HashSet
and HashMap
are two widely-used types and there are ways to make
them faster.
Alternative Hashers
The default hashing algorithm is not specified, but at the time of writing the default is an algorithm called SipHash 1-3. This algorithm is high quality—it provides high protection against collisions—but is relatively slow, particularly for short keys such as integers.
If profiling shows that hashing is hot, and HashDoS attacks are not a concern for your application, the use of hash tables with faster hash algorithms can provide large speed wins.
rustc-hash
providesFxHashSet
andFxHashMap
types that are drop-in replacements forHashSet
andHashMap
. Its hashing algorithm is low-quality but very fast, especially for integer keys, and has been found to out-perform all other hash algorithms within rustc. (fxhash
is an older, less well maintained implementation of the same algorithm and types.)fnv
providesFnvHashSet
andFnvHashMap
types. Its hashing algorithm is higher quality thanrustc-hash
’s but a little slower.ahash
providesAHashSet
andAHashMap
. Its hashing algorithm can take advantage of AES instruction support that is available on some processors.
If hashing performance is important in your program, it is worth trying more than one of these alternatives. For example, the following results were seen in rustc.
- The switch from
fnv
tofxhash
gave speedups of up to 6%. - An attempt to switch from
fxhash
toahash
resulted in slowdowns of 1-4%. - An attempt to switch from
fxhash
back to the default hasher resulted in slowdowns ranging from 4-84%!
If you decide to universally use one of the alternatives, such as
FxHashSet
/FxHashMap
, it is easy to accidentally use HashSet
/HashMap
in
some places. You can use Clippy to avoid this problem.
Some types don’t need hashing. For example, you might have a newtype that wraps
an integer and the integer values are random, or close to random. For such a
type, the distribution of the hashed values won’t be that different to the
distribution of the values themselves. In this case the nohash_hasher
crate
can be useful.
Hash function design is a complex topic and is beyond the scope of this book.
The ahash
documentation has a good discussion.
Byte-wise Hashing
When you annotate a type with #[derive(Hash)]
the generated hash
method
will hash each field separately. For some hash functions it may be faster to
convert the type to raw bytes and hash the bytes as a stream. This is possible
for types that satisfy certain properties such as having no padding bytes.
The zerocopy
and bytemuck
crates both provide a #[derive(ByteHash)]
macro that generates a hash
method that does this kind of byte-wise hashing.
The README for the derive_hash_fast
crate provides more detail for this
technique.
This is an advanced technique, and the performance effects are highly dependent on the hash function and the exact structure of the types being hashed. Measure carefully.