Hashing
HashSet and HashMap are two widely-used types and there are ways to make
them faster.
Alternative Hashers
The default hashing algorithm is not specified, but at the time of writing the default is an algorithm called SipHash 1-3. This algorithm is high quality—it provides high protection against collisions—but is relatively slow, particularly for short keys such as integers.
If profiling shows that hashing is hot, and HashDoS attacks are not a concern for your application, the use of hash tables with faster hash algorithms can provide large speed wins.
rustc-hashprovidesFxHashSetandFxHashMaptypes that are drop-in replacements forHashSetandHashMap. Its hashing algorithm is low-quality but very fast, especially for integer keys, and has been found to out-perform all other hash algorithms within rustc. (fxhashis an older, less well maintained implementation of the same algorithm and types.)fnvprovidesFnvHashSetandFnvHashMaptypes. Its hashing algorithm is higher quality thanrustc-hash’s but a little slower.ahashprovidesAHashSetandAHashMap. Its hashing algorithm can take advantage of AES instruction support that is available on some processors.
If hashing performance is important in your program, it is worth trying more than one of these alternatives. For example, the following results were seen in rustc.
- The switch from
fnvtofxhashgave speedups of up to 6%. - An attempt to switch from
fxhashtoahashresulted in slowdowns of 1-4%. - An attempt to switch from
fxhashback to the default hasher resulted in slowdowns ranging from 4-84%!
If you decide to universally use one of the alternatives, such as
FxHashSet/FxHashMap, it is easy to accidentally use HashSet/HashMap in
some places. You can use Clippy to avoid this problem.
Some types don’t need hashing. For example, you might have a newtype that wraps
an integer and the integer values are random, or close to random. For such a
type, the distribution of the hashed values won’t be that different to the
distribution of the values themselves. In this case the nohash_hasher crate
can be useful.
Hash function design is a complex topic and is beyond the scope of this book.
The ahash documentation has a good discussion.
Byte-wise Hashing
When you annotate a type with #[derive(Hash)] the generated hash method
will hash each field separately. For some hash functions it may be faster to
convert the type to raw bytes and hash the bytes as a stream. This is possible
for types that satisfy certain properties such as having no padding bytes.
The zerocopy and bytemuck crates both provide a #[derive(ByteHash)]
macro that generates a hash method that does this kind of byte-wise hashing.
The README for the derive_hash_fast crate provides more detail for this
technique.
This is an advanced technique, and the performance effects are highly dependent on the hash function and the exact structure of the types being hashed. Measure carefully.