Streamlined dataflow analysis code in rustc
Sometimes I pick a crate in the Rust compiler and read through its code to learn about it and find things that I can improve. Often the improvements are small, but sometimes they snowball.
One example involved the rustc_errors
crate. rustc is famous for its high
quality error messages. These messages can be long and sophisticated, with
multiple parts. The internal APIs for constructing and emitting errors are
complex, and had accumulated for years without anybody stepping back and
considering them as a whole. When I did that, it led to a couple of months of
detangling, across more than 30 pull requests and hundreds of commits. I hacked
through code duplication, naming inconsistencies, excessive abstraction,
unnecessary complexity, and some plain old weirdness. Errors are emitted from
thousands of places in the compiler, at every compilation stage, so these
improvements made a lot of code nicer and easier to understand. A full
description of this work would make for the world’s dullest blog post, so
instead I will write about a more interesting example.
rustc_mir_dataflow
is a crate that defines a framework for forward and
backward dataflow
analysis on
MIR, and contains more
than a dozen analyses implemented within that framework. These analyses
determine things like the program points where variables are live, or have been
initialized, or are borrowed. The analysis results are used in various parts of
the compiler such as the borrow checker. The crate is reasonably complex and
understanding it requires some knowledge of dataflow analysis concepts.
Fortunately, due to a misspent youth I am familiar with terms such as
lattice, partially-ordered set, top, bottom, least upper bound,
greatest lower bound, lub, glb, join, and meet. I even know which of
these are just different names for the same thing.
My changes
Let’s look at the 17 pull requests (PRs) I made for this crate.
#118203 Minor rustc_mir_dataflow
cleanups
- 16 commits
- -95 lines of code
This was a typical first cleanup PR, with all the commits just rounding off small rough edges. Eleven of them had a commit message starting with the word “Remove”.
#118230 Streamline MIR dataflow cursors
- 9 commits
- -130 lines of code
- 3 types and 2 traits removed
When a dataflow analysis runs it produces a “results” value, and the results
can be inspected with a “results cursor”. For example, you can answer questions
like “was variable x
initialized at the end of this basic block?”
There were two results types (Results
and ResultsCloned
) and three results
cursor types (ResultsCursor
, ResultsClonedCursor
, and ResultsRefCursor
).
You can tell from the names that they were all similar but optimized for
different ownership scenarios. But these different scenarios weren’t material.
This PR removed ResultsCloned
, ResultsClonedCursor
, and ResultsRefCursor
,
leaving just Results
and ResultsCursor
, both of which lost a type
parameter. The PR also removed a trait called CloneAnalysis
that was
basically a complicated synonym for Copy
, and another trait called
AnalysisResults
that isn’t worth explaining. The resulting code was shorter
and simpler (though slightly clunky in a couple of places, which we will return
to later) and there was no performance penalty.
#118638 More rustc_mir_dataflow
cleanups
- 4 commits
- -36 lines of code
- 1 type and 1 macro removed
The first commit in this PR removed unused arguments from some functions.
The second commit removed a macro impl_visitable!
that existed to avoid some
repetitive boilerplate code. But the macro had a single use and was complex
enough that writing the boilerplate code by hand took fewer lines of code and
was much easier to read. A tiny bit of code duplication is often better than a
bad abstraction. (The boilerplate code ended up being removed later on anyway.)
The third commit involved a generic type BorrowckAnalyses<B, U, E>
that was
used to implement two concrete typedefs BorrowckResults
and
BorrowckFlowState
. The commit removed BorrowckAnalyses
and replaced the two
typedefs with concrete types. Again, a tiny bit of boilerplate code was shorter
and simpler than the more abstract, repetition-free version.
The fourth commit made a tiny change to two methods in the GenKillAnalysis
trait. Not a big deal, but that trait would soon see a lot more action.
#131481 Remove GenKillAnalysis
- 7 commits
- -496 lines of code
- 2 traits and 1 type removed
This PR is my favourite.
There were two kinds of dataflow analysis in the compiler, each represented by
a trait: Analysis
, which is the basic kind, and GenKillAnalysis
, which is a
more specialized kind for gen-kill
analyses.
GenKillAnalysis
was similar to Analysis
but had some minor differences, and
there was a blanket implementation of Analysis
such that any type
implementing GenKillAnalysis
would automatically implement Analysis
. There
was some duplication of function across the two traits and the whole setup had
triggered my spidey senses for some time. I had previously made a couple of
failed attempts to make the two traits share more code.
Eventually I realized that GenKillAnalysis
was just an optimized way to run
a gen-kill analysis which involved precomputing a transfer function for each
basic block that would then reduce the amount of work done later on. But a
gen-kill analysis could also be run using Analysis
. So I tried doing that
with all the gen-kill analyses, curious to see what the performance regression
would be. Oh! It turned out to be a small performance improvement. Not big
wins, mostly sub-1% improvements in instruction counts, but it was consistent
across the benchmark suite. The supposed optimization was actually a
pessimization.
So this PR removed GenKillAnalysis
, the GenKillSet
type, and merged the
AnalysisDomain
trait into Analysis
. These simplifications set up some
subsequent improvements, such as the next PR.
#132338 Remove Engine
- 2 commits
- -66 lines of code
- 1 type removed
To run an analysis implementing the Analysis
trait you had to call the
into_engine
method which would create an Engine
value. Then you could
optionally give it a pass name, and finally you called iterate_to_fixpoint
to
perform the analysis and produce results. Like this:
let results = MyAnalysis
.into_engine(tcx, body)
.pass_name("my")
.iterate_to_fixpoint();
With GenKillAnalysis
gone, Engine
was reduced to a builder type with a
single modifier method, pass_name
. This PR removed into_engine
and Engine
by simply giving iterate_to_fixpoint
an extra Option<&'static str>
argument
holding the optional name. To run an analysis now looks like this:
let results = MyAnalysis.iterate_to_fixpoint(tcx, body, Some("my"));
Simpler and less plumbing all around.
#132346 Some graphviz tweaks
- 3 commits
- +5 lines of code
This PR was just some minor improvements to the machinery for generating graphviz visualizations of dataflow analysis results.
#132347 Remove ValueAnalysis
and ValueAnalysisWrapper
.
- 2 commits
- -193 lines of code
- 1 module, 1 trait, and 1 type removed
There was a module called value_analysis
with a trait called ValueAnalysis
and a struct called ValueAnalysisWrapper
that implemented the Analysis
trait. This infrastructure served a single dataflow analysis called
ConstAnalysis
, which implemented the ValueAnalysis
trait. No other dataflow
analysis needed (or will need) to do value analysis. Which means this
infrastructure was a great deal of unnecessary indirection and abstraction.
This PR removed value_analysis
, ValueAnalysis
, and ValueAnalysisWrapper
,
and moved their functional parts directly into ConstAnalysis
, making the
relevant code much shorter and easier to read.
#132134 Remove ResultsVisitable
- 2 commits
- -13 lines of code
- 1 trait and 1 type removed, 1 type added
Most analysis uses followed a pattern of “compute the analysis results, then
visit them”, but there was one exception. For borrow checking we ran three
separate analyses (Borrows
, MaybeUninitializedPlaces
, and
EverInitializedPlaces
), combined them into a single BorrowckResults
, and
only then visited that combined BorrowckResults
. BorrowckResults
was just
different enough from a normal Results
that it required the existence of a
ResultsVisitable
trait to abstract over the visiting differences. Each of
Results
and BorrowckResults
implemented that trait.
The first commit in this PR introduced Borrowck
and bundled the three borrow
check analysis results into a standard Results<Borrowck>
instead of the
exceptional BorrowckResults
. Once that was done, these results could be
visited like any other analysis results, and BorrowckResults
was removed.
That meant Results
was the only impl of ResultsVisitable
, and that trait
could be removed. This removed unnecessary layers of indirection, abstraction,
and generic types in several places.
#133155 Yet more rustc_mir_dataflow
cleanups
- 5 commits
- +29 lines of code
This cleverly-named PR did a few things.
- It merged two functions that were always called together.
- It added some comments explaining non-obvious aspects of the dataflow analysis framework that had taken me some time to work out.
- It made it possible for a
ResultsCursor
to either own or borrow aResults
, which fixed the slightly clunky code added above in #118230. - And a couple of other minor improvements.
#133326 Remove the DefinitelyInitializedPlaces
analysis.
- 1 commit
- -345 lines of code
- 2 types and 1 trait removed
There were four different dataflow analyses relating to the initialization of
variables and places: MaybeInitializedPlaces
, MaybeUninitializedPlaces
,
DefinitelyInitializedPlaces
, and EverInitializedPlaces
. As it happens,
DefinitelyInitializedPlaces
is just the inverse of
MaybeUninitializedPlaces
, was not meaningfully used, and had some minor bugs.
So this PR removed it.
This also allowed the removal of two things that were only needed for that
analysis: the MeetSemiLattice
trait (which was implemented by six types:
bool
, IndexVec
, BitSet
, ChunkedBitSet
, Dual
, and FlatSet
) and the
Dual
type (which implemented four traits: DebugWithContext
,
JoinSemiLattice
, MeetSemiLattice
, and GenKill
).
#133475 MaybeStorage
improvements
- 4 commits
- -9 lines of code
This PR made some small improvements, none of which were notable.
#133431 Remove HybridBitSet
- 9 commits
- -731 lines of code
- 3 types removed
This PR removed the most lines of code.
All the gen-kill analyses use bitsets to hold analysis results. The naive space cost is roughly O(m × n) because a function has multiple things such as local variables to track (m bits per bitset), and multiple program points at which they are tracked (n bitsets), and m and n increase roughly linearly as function size increases. For programs with large functions, this can results in tens or hundreds of megabytes of bitsets if you aren’t careful.
In 2018 I introduced a hybrid bitset
type into rustc, which gave huge reductions in peak memory use for the then-new
NLL borrow checker. This type, later renamed HybridBitSet
, would switch from
a sparse representation to a dense representation once the number of set bits
exceeded a small number (8). This was a good fit for rustc’s dataflow analysis
needs at the time. However, HybridBitSet
was later eclipsed by another type,
ChunkedBitSet
, that I introduced in 2022. ChunkedBitSet
can use a mix of
dense and sparse representation for different sections of a single bitset, and
replaced some of the HybridBitSet
uses. Also, the removal of
GenKillAnalysis
above removed more HybridBitSet
uses.
This PR replaced the few remaining HybridBitSet
uses with ChunkedBitSet
and
BitSet
(which has a simple, dense representation) and removed HybridBitSet
.
It also removed HybridBitIter
and an underlying type called SparseBitSet
.
To avoid a performance hit, I had to improve the ChunkedBitSet
iterator,
which had been implemented in a simple but slow fashion.
#133328 Simplify SwitchInt
handling
- 5 commits
- -68 lines of code
- Removed 1 trait, 2 types, a closure, and added 1 type
The code for handling switches (used to implement match
) had some complicated
control flow, involving an “applier” type implementing a particular trait, and
a closure that was passed to a method of the applier and called on each
outgoing switch edge. For the masochists out there, full details are in the
relevant commit
message.
In short, I found a way to simplify things, removing the trait and the closure.
I then tried a follow-up optimization attempt that was impossible before this refactoring and quite simple afterwards. That optimization attempt ended up failing, but it was a good way to prove that the new code is simpler.
#133891 Introduce MixedBitSet
- 6 commits
- +178 lines of code
- 2 types added
The aforementioned ChunkedBitSet
is good at minimizing memory usage when
compiling programs with huge functions where dataflow bitsets have large domain
sizes. But it’s heavyweight for small functions with small bitsets, because any
non-empty ChunkedBitSet takes at least 256 bytes of memory.
This PR introduced MixedBitSet
, a bitset that uses BitSet
for
small-to-medium bitsets and ChunkedBitSet
for large bitsets. It gave moderate
speed and memory usage wins, and it’s a lot simpler than the removed
HybridBitSet
.
#133938 rustc_mir_dataflow
cleanups, including some renamings
- 9 commits
- -62 lines of code
This PR renamed a bunch of things (variables, types, and methods) for greater consistency. It also removed some unused trait impls and bounds.
#134065 Move write_graphviz_results
- 2 commits
- -9 lines of code
This PR just moved a function to a better spot and fixed an out-of-date comment.
#134152 Simplify rustc_mir_dataflow::abs_domain
- 1 commit
- -26 lines of code
This final PR simplified a trait and an impl of that trait.
Summary
In total, these 17 PRs contained 87 commits, and removed 2,067 lines of code,
including 7 traits and around a dozen types. These changes were all to the
rustc_mir_dataflow
crate, except for some of the bitset changes that were in
the rustc_index
crate.
The old code had a lot of abstraction. There were many traits, traits bounds, generic types, and generic functions. Reading it often involved following control flow across multiple files: “we start at A, which calls B, which calls C, which calls D, which does the actual work”. It’s now more concrete, shorter, simpler, and a little faster.
When I started I didn’t know it would go this far. I just kept looking for little things to improve, over and over, especially the things that annoyed me. Sometimes clawing at loose strings is worthwhile.