`rustc_mir_dataflow`

crate to see how these things are handled in the rust compiler. There is a helpful introduction to this topic in the `rustc dev guide`

, and this post fleshes things out a bit.
Briefly, dataflow analysis helps one understand the way data flows throughout a program. There are intraprocedural and interprocedural approaches. Some types of applications dataflow analysis include:

- constant propagation / folding: for constant variables, replacing uses of those variables, and expressions containing those variables, with constants
- live variable analysis: determining when a variable may be used in the future – useful for identifying dead code
- available expression analysis: determining which sets of expressions need not be recomputed
- reaching analysis: determining which definitions may reach a given point

Dataflow analysis is commonly performed on a control flow graph (CFG). Rust’s MIR representation is basically a control flow graph structure, and so `rustc_mir_dataflow`

employs common analysis algorithms to determine some of your program’s properties.

Dataflow analysis on CFG’s is usually performed using fixed point iteration:

- initialize some start state associated with the entry point of the CFG
- for each node in the CFG, compute an input state using a
`meet`

function to combine incoming states from predecessor nodes and computing an output state using a`transfer`

function, applied to the input state - once all states of all nodes are unchanged (a fixed point has been reached), terminate

Once iteration has terminated, now you have a mapping from each node to its input and output states (also called facts) describing the analysis results at each point in the program.

Some examples of states:

- which variables are live?
- which expressions are available and precomputed?
- what definitions have established, constant values?

Some dataflow analysis problems lend themselves to efficient transfer function representations, and a technique called gen-kill analysis may be utilized in these cases.

Whereas before input and output were defined with:

Gen-kill problems define input and output as:

```
input = union(facts[predecessor] for predecessor in node.preds)
output = union((input - kill[current_node]), gen[current_node])
```

where `kill`

and `gen`

map to the portions of state that are removed and added at a particular point in the flow. For example, in live variable analysis `gen[statement]`

would describe the set of variables that are used at `statement`

and `kill[statement]`

would describe the set of variables defined at `statement`

.

Let’s take a look at what is perhaps the simplest instance of a gen-kill analysis in the rust compiler – determining `MaybeStorageDead`

. `StorageDead`

describes when a variable can have its memory freed. Knowing when a variable is dead is important for the compiler to know so that it may avoid using memory that is unallocated.

First, `MaybeStorageDead`

keeps track of the variables that are always live; this includes the return value of a function and all of its arguments:

```
#[derive(Clone)]
pub struct MaybeStorageDead {
always_live_locals: BitSet<Local>,
}
impl MaybeStorageDead {
pub fn new(always_live_locals: BitSet<Local>) -> Self {
MaybeStorageDead { always_live_locals }
}
}
```

A `BitSet`

is used as it is a space-efficient and performant representation of the state (or domain) of the analysis problem. Bit sets and bit vectors are common ways to represent the domain in gen-kill analysis problems.

Next, `MaybeStorageDead`

implements the `AnalysisDomain`

trait, which defines what the starting state value is for the analysis problem. In this case, it’s an empty bit set (all zeros) where each bit represents one of the locals in the MIR body. The trait also requires one to define mutations to the state upon entry to the starting block. This flips the bits representing every non-argument and non-return value to 1 – they’re presumed-dead by default:

```
impl<'tcx> crate::AnalysisDomain<'tcx> for MaybeStorageDead {
type Domain = BitSet<Local>;
const NAME: &'static str = "maybe_storage_dead";
fn bottom_value(&self, body: &mir::Body<'tcx>) -> Self::Domain {
// bottom = live
BitSet::new_empty(body.local_decls.len())
}
fn initialize_start_block(
&self, body: &mir::Body<'tcx>, on_entry: &mut Self::Domain) {
assert_eq!(
body.local_decls.len(),
self.always_live_locals.domain_size()
);
// Do not iterate on return place and args,
// as they are trivially always live.
for local in body.vars_and_temps_iter() {
if !self.always_live_locals.contains(local) {
on_entry.insert(local);
}
}
}
}
```

Next, it’s time to implement the necessary functions for the `GenKillAnalysis`

trait. This involves defining the effects of particular statements, terminators, and calls. Some of these methods have default implementations.

```
impl<'tcx> crate::GenKillAnalysis<'tcx> for MaybeStorageDead {
type Idx = Local;
fn statement_effect(
&self,
trans: &mut impl GenKill<Self::Idx>,
stmt: &mir::Statement<'tcx>,
_: Location,
) {
match stmt.kind {
StatementKind::StorageLive(l) => trans.kill(l),
StatementKind::StorageDead(l) => trans.gen(l),
_ => (),
}
}
fn terminator_effect(
&self,
_trans: &mut impl GenKill<Self::Idx>,
_: &mir::Terminator<'tcx>,
_: Location,
) {
// Terminators have no effect
}
fn call_return_effect(
&self,
_trans: &mut impl GenKill<Self::Idx>,
_block: BasicBlock,
_return_places: CallReturnPlaces<'_, 'tcx>,
) {
// Nothing to do when a call returns successfully
}
}
```

Here, the only interesting part is in `statement_effect`

. When a variable becomes live, it’s added to the kill set. When it becomes dead, it’s added to the gen set.

And that’s all there is to it. Now we can take an arbitrary program and run analysis on it, and inspect the state at each program point. Some of these comments originate from the compiler code:

```
// The set of locals in a MIR body that do not have
// `StorageLive`/`StorageDead` annotations.
// These locals have fixed storage for the duration of the body.
// This is a bit set with all locals that don't have an explicit `StorageLive`
// or `StorageLocal` set removed from the bit set (i.e. always live, or 0)
let always_live_locals = always_storage_live_locals(&mir);
let mut maybe_dead = MaybeStorageDead::new(always_live_locals)
.into_engine(tcx, &mir)
.iterate_to_fixpoint()
.into_results_cursor(&mir);
for (bb, _block) in mir.basic_blocks.iter_enumerated() {
maybe_dead.seek_after_primary_effect(mir.terminator_loc(bb));
let state = maybe_dead.get();
println!("maybe dead: {:?}", state);
}
```

Let’s test this on a contrived program:

Let’s look at the MIR for this program block by block, along with the results of our dataflow analysis:

```
fn test() -> () {
// Live (_0 is always the MIR return value, and is always live)
let mut _0: ();
// Presumed dead to start (via `initialize_start_block`)
let mut _1: i32;
let _3: ();
let mut _4: bool;
let mut _5: i32;
let mut _6: i32;
// Will have no associated `StorageDead` or `StorageLive` annotation,
// so always presumed live
let mut _7: (i32, bool);
scope 1 {
debug x => _1;
// Presumed dead to start (via `initialize_start_block`)
let _2: &mut i32;
scope 2 {
debug y => _2;
}
}
// ...
```

This is the set of local declarations for the program. A combination of always-live locals and presumed-dead locals yield the state `[_1, _2, _3, _4, _5, _6]`

when entering `bb0`

.

```
bb0: {
StorageLive(_1);
_1 = const 2_i32;
FakeRead(ForLet(None), _1);
StorageLive(_2);
_2 = &mut _1;
FakeRead(ForLet(None), _2);
StorageLive(_3);
StorageLive(_4);
StorageLive(_5);
_5 = (*_2);
_4 = Lt(move _5, const 10_i32);
StorageDead(_5);
switchInt(move _4) -> [0: bb2, otherwise: bb1];
}
// bb0 (output): [_5, _6]
```

Here we can see that all previously-presumed dead locals except `_5`

and `_6`

have their own `StorageLive`

statement not paired by a `StorageDead`

statement, and so upon leaving this block only those locals are presumed dead (after having been added to the kill set).

In one branch of the `bb0`

control flow (`bb1`

), `_6`

becomes live, and so is part of the kill set and removed from the state:

```
bb1: {
StorageLive(_6);
_6 = (*_2);
_7 = CheckedAdd(_6, const 1_i32);
assert(
!move (_7.1: bool),
"attempt to compute `{} + {}`, which would overflow",
move _6, const 1_i32) -> [success: bb3, unwind: bb6];
}
// bb1 (output): [_5]
```

but in the other branch (`bb2`

), it does not become live, and so remains in the set of locals presumed dead:

Coming from `bb1`

where `_6`

became live, it now becomes dead in `bb3`

and so is added to the `gen`

set and the state leaving that block:

```
bb3: {
(*_2) = move (_7.0: i32);
StorageDead(_6);
_3 = const ();
goto -> bb5;
}
// bb3 (output): [_5, _6]
```

Coming from `bb2`

, nothing changes in `bb4`

because no variables are declared live or dead:

Now, at the end of the function, all of the remaining variables `_1`

through `_4`

are declared dead and so are added to the gen set and outgoing state:

```
bb5: {
StorageDead(_4);
StorageDead(_3);
(*_2) = const 3_i32;
_0 = const ();
StorageDead(_2);
StorageDead(_1);
return;
}
// bb5 (output): [_1, _2, _3, _4, _5, _6]
```

Finally, if the checked addition in `bb1`

fails (where the output state is `[_5]`

, `bb6`

is entered where no variables are declared live are dead, and so the state remains unchanged:

I’ve skipped over a bunch of the terminology. Those are covered in depth in compiler courses that touch upon dataflow analysis; you can search for terms like bottom, lattice, semilattice, joins, transfer functions, and more. The Wikipedia page on the topic is a decent place to dig into more.

As a way of wrapping up, I wanted to give a list of where else the Rust compiler utilizes this form of dataflow analysis.

- Tracking the flow of borrows
- Tracking whether a reference or pointer could point to a given local
- Liveness analysis
- Possibly-initialized variables
- Possibly-uninitialized variables
- Definitely-initialized variables
- Possibly-requiring storage

In addition to this (incomplete) list of gen-kill problems, the rust compiler also has some analyses modeled using the more generic albeit less-performant `Analysis`

trait. For example, as of writing this post, the latest version of constant propagation, `ConstAnalysis`

, is built on top of `ValueAnalysis`

(something surprising to me is that this isn’t also set up as a gen-kill analysis, particularly the portion formulating reachability, which is a go-to use-case for gen-kill analysis. Please let me know why, if you know).

*The code for the post is available here.*

Just recently, two articles have surfaced describing generic recursion in Rust. I recommend reading them. There will likely be further posts in the series, but I didn’t want to wait to at least try out the basics of the `recursion`

crate. Thinking of what recursive data structures could be tried, I decided to start by prototyping a Rust adaptation of `alga`

, a Haskell library for algebraic graphs.

*small aside: I’m focusing on alga’s Graph type specifically, not the Graph typeclass it defines, of which many more-efficient representations are made an instance, because it was the simplest representation I could use to play with the recursion crate*

Below we have an adapted form of the datatype introduced in the paper *Algebraic Graphs with Class*.

```
#[derive(Debug, Clone)]
pub enum RGraph<Val, A> {
Empty,
Vertex(Val),
Overlay(A, A),
Connect(A, A),
}
```

A `Graph`

can be one of:

`Empty`

: no vertices, no edges`Vertex`

: a single vertex of type`Val`

`Overlay`

: a graph built by taking the union of both sets of vertices, and both sets of edges (*V*_{1}∪*V*_{2},*E*_{1}∪*E*_{2})`Connect`

: a graph constructed by connecting the edges of the two graphs. This is done by unioning in the same way as`Overlay`

, but additionally unioning the resulting edge set with the cross-product of both vertex sets (*V*_{1}∪*V*_{2},*E*_{1}∪*E*_{2}∪*V*_{1}×*V*_{2})

For further detail, I encourage you to check out the paper and library.

`recursion`

crateWith the recursive definition out of the way, let’s make use of the `recursion`

crate’s `RecursionTree`

to make `RGraph`

a functor, wrapping the type with a struct `Graph`

:

```
use recursion::map_layer::MapLayer;
use recursion::recursive_tree::arena_eval::ArenaIndex;
use recursion::recursive_tree::RecursiveTree;
pub type RecursiveGraph<V> = RecursiveTree<RGraph<V, ArenaIndex>, ArenaIndex>;
impl<A, B, V> MapLayer<B> for RGraph<V, A> {
type To = RGraph<V, B>;
type Unwrapped = A;
fn map_layer<F: FnMut(Self::Unwrapped) -> B>(self, mut f: F) -> Self::To {
use RGraph::*;
match self {
Empty => Empty,
Vertex(v) => Vertex(v),
Overlay(a, b) => Overlay(f(a), f(b)),
Connect(a, b) => Connect(f(a), f(b)),
}
}
}
struct Graph<V> {
inner: RecursiveGraph<V>
}
```

With that, we can now use catamorphisms and anamorphisms to destruct and construct this datatype for our purposes, starting with a method to construct a graph from a list of vertices:

```
impl<V: Hash + Eq + Clone + Debug> Graph<V> {
/// Constructs a [`Graph`] from a vector of vertices
pub fn vertices(vs: Vec<V>) -> Self {
Graph {
inner: RecursiveGraph::expand_layers(vs, |mut remaining| {
use RGraph::*;
match remaining.len() {
0 => Empty,
1 => Vertex(remaining.pop().unwrap()),
_ => {
let ending_half =
remaining.split_off(remaining.len() / 2);
Overlay(remaining, ending_half)
}
}
}),
}
}
...
}
```

This function isn’t terribly interesting on its own, it doesn’t add any edges to the graph and is not suited for adding vertices to an already-existing graph, but it is one of the primitive builders provided by `alga`

. We can see that this anamorphism splits the list of vertices in half and overlays them.

We can do something very similar in order to construct a clique (which `alga`

also defines), simply by using `Connect`

instead of `Overlay`

:

```
/// Constructs fully connected graph
pub fn clique(vs: Vec<V>) -> Graph<V> {
Graph {
inner: RecursiveGraph::expand_layers(vs, |mut remaining| {
use RGraph::*;
match remaining.len() {
0 => Empty,
1 => Vertex(remaining.pop().unwrap()),
_ => {
let ending_half =
remaining.split_off(remaining.len() / 2);
Connect(remaining, ending_half)
}
}
}),
}
}
```

To ensure that `vertices`

does the thing it’s supposed to, we can add a method to count the number of vertices in the graph, also known as the graph’s order:

```
/// Gets the number of vertices in the graph
pub fn order(self) -> usize {
use RGraph::*;
self.inner.collapse_layers(|layer: RGraph<V, HashSet<V>>| match layer {
Empty => HashSet::new(),
Vertex(v) => HashSet::from_iter([v]),
Overlay(a, b) => set_union(a, b),
Connect(a, b) => set_union(a, b)
}).len()
}
```

Here we fold the graph layer by layer, unioning the sets of vertices together and getting the cardinality of the set at the end.

As you may have guessed, there isn’t really anything in the representation above preventing one from adding vertices and edges when they are already present. In fact, `alga`

has a function to `simplify`

a representation by pruning redundancies. I’m unsure how often this is needed in practice, but efficiency is not the focus of this blog post in any case.

Speaking of efficiency… `alga`

conducts some of its queries on temporary structures that are more performant, such as adjacency maps. Replicated below:

```
type AdjacencyMap<V> = HashMap<V, HashSet<V>>;
impl<V: Hash + Eq + Clone + Debug> Graph<V> {
...
/// Folds a [`Graph`] to construct an adjacency map
pub fn to_adjacency_map(self) -> AdjacencyMap<V> {
use RGraph::*;
self.inner
.collapse_layers(|layer: RGraph<V, AdjacencyMap<V>>| match layer {
Empty => HashMap::new(),
Vertex(v) => {
let mut map = HashMap::new();
map.insert(v, HashSet::new());
map
}
Overlay(a, b) => union_with(&a, &b, set_union),
Connect(a, b) => unions_with(
vec![
&a,
&b,
&from_iter_with(HashSet::<V>::from_iter(a.keys().cloned()),
|_| {
// every `a` node gets `bs` as neighbors
HashSet::from_iter(b.keys().cloned())
}),
],
set_union,
),
})
}
...
}
```

In the case of an `Overlay`

we combine adjacency lists by unioning the neighbors of identical nodes. In the case of a `Connect`

, the same combination is also unioned with a neighbor set representing a fully connected subgraph of all nodes in `a`

and `b`

.

With this representation available, we can easily query the number of edges in the graph:

```
/// Calculates the number of edges in the graph
pub fn size(self) -> usize {
self.to_adjacency_map().values().map(HashSet::len).sum()
}
```

Finally, to test what we’ve got:

```
#[cfg(test)]
mod tests {
use crate::*;
#[test]
fn test_graph() {
let vs = vec![1, 2, 3, 4, 5, 6, 7, 8, 9];
// fully connected graphs have (n)(n-1)/2 edges
assert_eq!(Graph::clique(vs.clone()).size(), (9 * 8 / 2));
assert_eq!(Graph::vertices(vs.clone()).size(), 0);
assert_eq!(Graph::vertices(vs.clone()).order(), 9);
}
}
```

That’s a decent start. Credit to Inanna Malick for making the `recursion`

crate: it was fun to play with!

It would be nice if we could run multiple passes over the graph, and not have to consume it just to do something like count its nodes. To enable this, a few tweaks have to be made. First, we introduce a recursive definition of `RGraph`

over borrowed data:

This definition allows us to create an implementation of `MapLayer`

that takes a recursive type that owns data (`RGraph`

), and map it to one that refers to that data (`RGraphRef`

):

```
impl<'a, A: Copy + 'a, B: 'a, V: 'a> MapLayer<B> for &'a RGraph<V, A> {
type To = RGraphRef<'a, V, B>;
type Unwrapped = A;
fn map_layer<F: FnMut(Self::Unwrapped) -> B>(self, mut f: F) -> Self::To {
match self {
RGraph::Empty => RGraphRef::Empty,
RGraph::Vertex(v) => RGraphRef::Vertex(v),
RGraph::Overlay(a, b) => RGraphRef::Overlay(f(*a), f(*b)),
RGraph::Connect(a, b) => RGraphRef::Connect(f(*a), f(*b)),
}
}
}
```

Now, when applying our algebra to tear down this recursive structure, it can be in terms of the referenced data. For example, let’s see how our definition of `order`

would change:

```
- pub fn order(self) -> usize {
+ pub fn order(&self) -> usize {
let unique =
- self.inner.collapse_layers(|layer: RGraph<V, HashSet<V>>| {
+ self.inner.as_ref().collapse_layers(|layer: RGraphRef<V, HashSet<&V>>| {
- use RGraph::*;
+ use RGraphRef::*;
match layer {
Empty => HashSet::new(),
Vertex(v) => HashSet::from_iter(vec![v]),
Overlay(a, b) => set_union(a, b),
Connect(a, b) => set_union(a, b),
}
});
unique.len()
}
```

And there you have it!

Below are the definitions of referenced helper functions:

```
fn set_union<V: Eq + Hash + Clone>(u: HashSet<V>, v: HashSet<V>) -> HashSet<V> {
u.union(&v).cloned().collect()
}
/// Constructs a HashMap, determines a key's values by applying
/// a given function to each respective key
fn from_iter_with<K: Eq + Hash, V, F>(
it: impl IntoIterator<Item = K>,
f: F,
) -> HashMap<K, V>
where
F: Fn(&K) -> V,
{
it.into_iter().fold(HashMap::new(), |mut acc, k| {
let v = f(&k);
acc.insert(k, v);
acc
})
}
/// Unions two HashMaps by applying a given function to
/// the values of common keys
fn union_with<'a, K: 'a + Eq + Hash + Clone, V: 'a + Clone, F>(
a: &'a HashMap<K, V>,
b: &'a HashMap<K, V>,
f: F,
) -> HashMap<K, V>
where
F: Fn(V, V) -> V,
{
unions_with([a, b], f)
}
/// Unions multiple HashMaps by applying a given function to
/// the values of common keys
fn unions_with<'a, K: 'a + Eq + Hash + Clone, V: 'a + Clone, F>(
maps: impl IntoIterator<Item = &'a HashMap<K, V>>,
f: F,
) -> HashMap<K, V>
where
F: Fn(V, V) -> V,
{
maps.into_iter().fold(HashMap::new(), |mut acc, map| {
for (k, v) in map.iter() {
if let Some(u) = acc.remove(&k) {
acc.insert(k.clone(), f(u, v.clone()));
} else {
acc.insert(k.clone(), v.clone());
}
}
acc
})
}
```

`std::async`

; based on some seed data, the goal was to asynchronously run a compute-intensive calculation where the result may or may not be needed at a later time. When new seed data became available, it was an indication that any in-progress computation should be terminated early and that its result be considered defunct.
Below I’m going to outline a contrived example that demonstrates the sort of trouble I ended up running into.

```
enum class CalculationProgress { Unaborted, Aborted };
std::optional<int>
calculate(std::string name,
std::shared_ptr<std::atomic<CalculationProgress>> abort_status) {
const auto arbitrary_computation_time = std::chrono::seconds(10);
const auto arbitrary_loop_time = std::chrono::seconds(5);
const auto arbitrary_max_loops =
arbitrary_computation_time / arbitrary_loop_time;
auto count = 0;
bool was_aborted = false;
do {
std::this_thread::sleep_for(arbitrary_loop_time);
count++;
was_aborted = *abort_status == CalculationProgress::Aborted;
} while (count < arbitrary_max_loops && !was_aborted);
const auto status = was_aborted ? "ABORTED" : "COMPLETE";
std::cout << name << ": " << status << ", count: " << count << std::endl;
return was_aborted ? std::nullopt : std::optional(count);
}
```

The function `calculate`

above is the contrived, compute-intensive calculation. It periodically checks the condition while performing its calculation (in this case every five seconds). Depending on its value, it either finishes its long-running calculation or gets aborted – whichever occurs first. From `arbitrary_loop_time`

in the do-while loop, it’s clear that `calculate`

will have a best-case execution time of five seconds.

The entity making async calls to `calculate`

is a class called `Calculator`

, and it tracks the results of `calculate`

in one of its member attributes.

```
struct Calculator {
std::future<std::optional<int>> future_value{};
void launch_async_calc(
std::string name,
std::shared_ptr<std::atomic<CalculationProgress>> abort_status) {
future_value =
std::async(std::launch::async, [=]() { return calculate(name, abort_status); });
}
};
```

With these in place, async computations can be launched and aborted:

```
int main() {
Calculator calculator{};
auto cv1 = std::make_shared<std::atomic<CalculationProgress>>(
CalculationProgress::Unaborted);
calculator.launch_async_calc("foo", cv1);
auto cv2 = std::make_shared<std::atomic<CalculationProgress>>(
CalculationProgress::Unaborted);
calculator.launch_async_calc("bar", cv2);
*cv1 = CalculationProgress::Aborted;
*cv2 = CalculationProgress::Aborted;
return 0;
}
```

When running this program, we get the following output:

```
$ time ./calculate_foo
foo: COMPLETE, count: 2
bar: ABORTED, count: 2
./calculate_foo 0.00s user 0.00s system 0% cpu 10.002 total
```

Something is off – wasn’t `foo`

supposed to be aborted? And the program runs for about ten seconds. I’d expect both `launch_async_calc`

calls to run in parallel on my machine; `foo`

would recognize its toggled abort after one loop of five seconds, `bar`

would would do the same, and the program time should then be five seconds or so. What happened?

Using some good, old-fashioned print debugging, let’s check to see if these processes are launched concurrently by adding the following to the top of `calculate`

:

```
std::optional<int>
calculate(std::string name,
std::shared_ptr<std::atomic<CalculationProgress>> abort_status) {
std::cout << name << ": LAUNCHED" << std::endl;
...
}
```

Now the program outputs:

```
$ time ./calculate_foo
foo: LAUNCHED
bar: LAUNCHED
foo: COMPLETE, count: 2
bar: ABORTED, count: 2
./calculate_foo 0.00s user 0.00s system 0% cpu 10.002 total
```

So it does look like they’re getting launched concurrently. From the ten-second run time and loop count, it would appear that `foo`

wasn’t appropriately aborted as we would suspect. Adding one more log line to `main`

:

we see:

```
$ time ./calculate_foo
foo: LAUNCHED
bar: LAUNCHED
foo: COMPLETE, count: 2
ABORTING
bar: ABORTED, count: 2
./calculate_foo 0.00s user 0.00s system 0% cpu 10.002 total
```

Okay, so `foo`

has completely finished its calculation before it’s been properly aborted, despite the fact each calculation is launched concurrently. It would appear something is blocking on the calculation for `foo`

.

After a few more careful placements, the culprit is narrowed down to the only line in `calculator.launch_async_calc("bar", cv2)`

:

After measuring the time it takes to execute this line, the program shows that it takes a whole nine seconds to reassign the `std::future`

returned by the `std::async`

call.

```
$ time ./calculate_foo
foo: assigned in 0 seconds
foo: LAUNCHED
bar: LAUNCHED
bar: ABORTED, count: 0
foo: COMPLETE, count: 2
bar: assigned in 9 seconds
ABORTING
./calculate_foo 0.00s user 0.00s system 0% cpu 10.002 total
```

When dealing with this problem originally, outside of this contrived example, I started completely puzzled. After narrowing down to two possibilities, the `std::future`

move assignment operator or its destructor, I decided to search online and found a page that resulted in palm-to-face contact…

Why is the destructor of a future returned from `std::async`

blocking?

After the initial surprise wore off, and after reading through a portion of the treasure-trove of information linked by the top answer, I decided to see if the official documentation for `std::future`

had anything to say about this. More or less, about halfway down the page, it does:

If the std::future obtained from std::async is not moved from or bound to a reference, the destructor of the std::future will block at the end of the full expression until the asynchronous operation completes, essentially making code such as the following synchronous:

```
std::async(std::launch::async, []{ f(); }); // temporary's dtor waits for f()
std::async(std::launch::async, []{ g(); }); // does not start until f() completes
```

(note that the destructors of std::futures obtained by means other than a call to std::async never block)

I’m not completely satisfied with lack of emphasis on the reference page, but it at least alludes to one possible way of making the above program work as intended. All it takes is a few lines. We maintain a vector of past calculations and move the future that’s about to get reassigned into the vector before doing so:

```
struct Calculator {
std::future<std::optional<int>> future_value{};
std::vector<std::future<std::optional<int>>> old_futures{}; // new
void launch_async_calc(std::string name,
std::shared_ptr<std::atomic<CalculationProgress>> abort_status) {
old_futures.push_back(std::move(future_value)); // new
future_value =
std::async(std::launch::async, [=]() { return calculate(name, abort_status); });
}
};
```

This solution has its problems. For example, now there arguably should be something that prunes `old_futures`

, once they complete, in order to avoid a vector that perpetually grows. The program now outputs:

```
$ time ./calculate_foo
foo: assigned in 0 seconds
foo: LAUNCHED
bar: assigned in 0 seconds
bar: LAUNCHED
ABORTING
foo: ABORTED, count: 1
bar: ABORTED, count: 1
./calculate_foo 0.00s user 0.00s system 0% cpu 5.002 total
```

The total run time of the program is still limited to the minumum run time of calculating `foo`

, because each element in `old_futures`

still has a blocking destructor, but at least subsequent calls to `launch_async_calc`

will not be blocking on prior calls and the program now behaves as we would expect.

I’m on the fence about considering this a true foot-gun or not. But something that does make it seem like one, at least to me, is the fact that `std::futures`

returned by `std::promise`

, for example, do not exhibit this blocking behavior. Either way, having been made aware of this, I’ll be keeping it in mind.

In a previous post we utilized recursion schemes in prototyping a genetic algorithms library. I wanted to look more into their use cases and was happy to discover that they could even be leveraged for dynamic programming.

One of my favorite applications of dynamic programming is matrix chain multiplication; given a bunch of matrices with shared indices, the goal is to find the smallest number of arithmetic operations possible in calculating their product. Usually, this is accompanied by deriving the optimal parenthesization.

There are *lots* of examples online that outline the conventional dynamic programming approach, so I’m not going to rehash it here. I do, however, recommend familiarizing yourself with the dynamic programming solution before moving on. Here, we’ll focus on leveraging recursion schemes to get the job done.

To find the optimal parenthesization, Hinze and Wu leverage a recursion scheme called a dynamorphism, which can be modeled in part with a recursion scheme touched upon in the last post called a hylomorphism, described as a function that unfolds (builds) some intermediate structure (using a `CoAlgebra`

) and folds (reduces) that intermediate structure into some accumulated value (using an `Algebra`

):

```
type Algebra f a = f a -> a
type CoAlgebra f a = a -> f a
hylo :: Functor f => CoAlgebra f a -> Algebra f b -> a -> b
hylo f g = h where h = g . fmap h . f
```

Due to this unfolding and folding mechanism, hylomorphisms are sometimes referred to as a kind of refold.

Dynamorphisms, another type of refold, are very similar to a hylomorphism; a dynamorphism performs the same behavior as a hylomorphism but maintains a record of its folds by storing the result of each one into a structure called a `Cofree Comonad`

:

If this looks confusing, don’t fret. It’s extremely similar to a recursive definition of a list type, and we’ll be treating it as such:

```
-- normal list type
data List valueType =
Cons valueType (List valueType) | Nil
-- closer to Cofree
data ListS containerType valueType =
ConsS valueType (ListS containerType valueType)
-- even closer to Cofree
data ListCF containerType valueType =
ConsCF valueType (containerType (ListCF containerType valueType))
-- basically Cofree
data ListCF containerType valueType =
valueType :< (containerType (ListCF containerType valueType))
```

Let’s compare the signature of a dynamorphism to a hylomorphism without the type synonyms:

```
hylo :: Functor f => (a -> f a) -> (f b -> b) -> a -> b
dyna :: Functor f => (a -> f a) -> (f (Cofree f b) -> b) -> a -> b
```

We can see that they are pretty similar, and we can see in the definition of `dyna`

that this really is a hylomorphism that keeps track of the values calculated:

```
-- extracts the first value from Cofree
extract :: Cofree f a -> a
extract (a :< _) = a
dyna :: Functor f => (a -> f a) -> (f (Cofree f b) -> b) -> a -> b
dyna h g = extract . hylo h (\fcfb -> (g fcfb) :< fcfb)
```

The dynamorphism builds an intermediate structure with `h a`

and it stores the result of applying `g`

to that value in a `Cofree`

. Using our list analogy, it’s applying `g`

to the incoming functor and prepending the result to an existing list of past-calculated values. This is reminiscent of the `iterate`

function, which repeatedly applies a function `f`

to some value `x`

and appends each application’s result to a list.

By keeping a history of the dynamorphism’s applications of the folding function it’s supplied, one can reach back into the `Cofree`

structure and utilize pre-calculated values a la dynamic programming, which is exactly what Hinze and Wu do for matrix chain multiplication:

```
-- grab the nth element of a Cofree
get :: Cofree (ListF v) a -> Int -> a
get (x :< xs) 0 = x
get (x :< (Cons _ xs)) n = xs `get` (n-1)
-- take the first n elements of a Cofree
collect :: Cofree (ListF v) a -> Int -> [a]
collect _ 0 = []
collect (x :< (Some _)) n = [x]
collect (x :< (Cons _ cf)) n = x : cf `collect` (n-1)
chainM :: [Int] -> Int
chainM dims = dyna triangle findParen range where
range = (1, length dims - 1)
triangle :: (Int,Int) -> ListF (Int,Int) (Int,Int)
triangle (1,1) = Some (1,1)
triangle (i,j)
| i == j = Cons (i,j) (1,j-1)
| otherwise = Cons (i,j) (i+1,j)
findParen :: ListF (Int,Int) (Cofree (ListF (Int,Int)) Int) -> Int
findParen (Some j) = 0
findParen (Cons (i,j) table)
| i == j = 0
| i < j = minimum (zipWith (+) as bs) where
as = [(dims !! (i-1)) * (dims !! k) * (dims !! j)
+ (table `get` offset k) | k <- [i..j-1]]
bs = table `collect` (j-i)
offset k = ((j*(j+1) - k*(k+1)) `div` 2) - 1
```

With dependencies between subproblems modeled as a directed acyclic graph, the algorithm unfolds a range into a list of cell coordinates in reverse topological order using the `triangle`

function. It then folds that list in topological order with `findParen`

, maintaining a table of past-calculated values along the way. With every folding step, `findParen`

reaches back into the table to find the pre-calculated values in cells `(i,k)`

and `(k+1,j)`

for every `k`

in the range `[i,j-1]`

and finds the best `k`

to split the intermediate matrix *M*_{ij}. As you can tell from the way these values are calculated, Hinze and Wu carefully consider and identify where these previously-calculated values may be found in the `Cofree`

.

However, there are two pieces causing this function to lose the 𝒪(*N*^{3}) complexity found in the conventional dynamic programming approach; our first problem is that Haskell lists are not built like typical arrays, but like linked lists, so `dims !! x`

has complexity 𝒪(*N*) rather than the desired 𝒪(1) – but this at least can be remedied with something like `Data.Vector`

. Our second problem is similar; `Cofree`

is like a linked list, and so reaching into the `Cofree`

structure with `get`

is a linear operation; the pre-calculated value at `offset k`

is not guaranteed to be within a constant distance of the head of `Cofree`

. If we were calculating something like the *n*^{th} Fibonacci number, we would only need to look back two elements and would be in the clear, but here it is not the case; we’ve caused the overall complexity to reach 𝒪(*N*^{4}).

I’m unaware of a way that something like `Cofree`

can allow us to peek into its structure in constant time, particularly when the element we’d like grab is arbitrarily nested. So, instead of relying upon a dynamorphism, we can return to a hylomorphism – and, instead of a list or a vector of pre-calculated scores with nontrivial offset calculations, we’ll maintain a “two-dimensional” memoization table. For simplicity’s sake, we’ll leverage a `HashMap`

, but one could easily use `Data.Vector`

. So, `findParen`

will then take the following form:

```
type Map = HashMap.Map
chainM :: [Int] -> Int
chainM dims = best where
best = (hylo triangle findParen range) ! range
-- ...
findParen :: Algebra (ListF (Int,Int)) (Map (Int,Int) Int)
findParen (Some (j,_)) = Map.insert (j,j) 0 Map.empty
findParen (Cons (i,j) table)
| i == j = Map.insert (i,j) 0 table
| i < j = Map.insert (i,j) (minimum parenthesizations) table where
cost x y = table ! (x,y)
space (x,y,z) = (dims !! x) * (dims !! y) * (dims !! z)
parenthesizations =
[space (i-1,k,j) + cost i k + cost (k+1) j | k <- [i..j-1]]
```

With this change and an imagined switch to `Data.Vector`

for the sequence of `dims`

, we can bring the time complexity back to 𝒪(*N*^{3})^{*}.

* *Sort of, the unordered-containers package mentions “[m]any operations have a average-case complexity of 𝒪( logN). The implementation uses a large base (i.e. 16) so in practice these operations are constant time.” If we really wanted constant time, we could use a two dimensional*

`Vector`

`HashMap`

For a chain of matrices such as:

we can see the minimum number of operations possible is `102`

:

We can view this chain of matrices as a graph with weighted edges that we’d like to contract together; each matrix multiplication costs the product of all edge weights incident to the matrices being multiplied, up to a constant factor:

If the relationship between the equation and diagram is escaping you, I encourage you to check out the wonderful tutorial by Tai-Danae Bradley, which indicates that matrix multiplication is a special case of tensor contraction. Let’s take a look at a different graph:

Instead of matrices with two dimensions, we have multidimensional tensors with some indices that are free. The problem is now to find the optimal parenthesization of a chain of tensors to be contracted.

Additionally, the indices of the tensors involved in the chain contraction need not be free; they can be bound to other tensors outside of the chain of interest, or even to other tensors in the chain.

We can now put our sights on finding the optimal contraction order of a path within some arbitrary tensor network.

It turns out that we can apply the same dynamic programming approach to tensor chain contraction – and credit goes to my colleague Jonathan Jakes-Schauer for the insight. The key is to keep track of all indices that are incident to the chain of interest, because they’ll each contribute to the cost of the contractions in which their tensors participate.

For matrix chain products we saw that `findParen`

identified an index `k`

that optimally decomposed a given intermediate matrix into two. This calculation recursively depended upon the optimal way to decompose those two matrices, down to the original matrices in the chain (which cost nothing to construct). The question then is how to calculate the cost of contracting two tensors; two contracted tensors may share more than just one index, and so we must account for more than just the dimensions of those indices, as we did before. To illustrate, consider the example below:

Once tensor A has been contracted into B, and C into D, the resulting intermediate tensors AB and CD share two indices. Because index `w`

was originally incident to neither tensor B nor D, it’s important that we don’t overcount the number of operations associated with contracting AB and CD; if this contraction appeared in the dynamic programming routine with *i* = 1, *k* = 2, and *j* = 4, and we calculated the cost as before without regard to the identity of the indices involved, we would cost this contraction at *w**z**w* rather than the true cost, which is *w**z*. Towards this end, we’ll maintain the identity of all indices participant to and resulting from each contraction, using the following types:

```
type Tensor = Map Int Int
data ContractionTree a =
Tensor a
| Intermediate (ContractionTree a) (ContractionTree a)
data TensorData = TensorData {
totalCost :: Int,
recipe :: ContractionTree Int,
indices :: Tensor
}
```

where `totalCost`

denotes the cost of the contraction plus the total cost of having created each of the two tensors being contracted, `recipe`

is a helper datatype for knowing the order in which those two tensors were created and contracted together, and `indices`

is the resulting tensor itself, represented by the bundle of incident indices – a mapping from index identifiers to their dimensions.

Suppose we would like to contract two tensors and want to know the resulting `totalCost`

and `indices`

using a function with the following type:

Suppose the indices of those two tensors are a given and we’d like to know what indices are left over from the contraction. We can see from an example like the one below:

that the indices left over from the contraction are equal to the symmetric difference of all indices incident to the tensors being contracted:

```
contract :: (TensorData, TensorData) -> TensorData
contract (left,right) = TensorData {
totalCost = ...,
recipe = ...,
indices = symDiff (indices left) (indices right),
} where
symDiff l r = (l \\ r) <> (r \\ l)
```

where `x \\ y`

is the set difference, or `x - y`

, and `x <> y`

is the set union.

We’ve mentioned already that the cost of contracting two tensors is the product of all indices incident to the two tensors. More formally, for tensors *A* and *B*, the cost of contraction is

where *I*(*X*) denotes the set of indices belonging to some tensor *X*. Given this, and set of indices we’ve maintained from earlier contractions, we can then calculate the cost of splitting an intermediate tensor as was done in the case of matrices. We identify two “child” tensors when splitting some intermediate tensor as discussed above; it’s determined by the start *i* of the subchain in question, the end *j* of the subchain in question, and the marker *k*, where the chain shall be split; the intermediate tensor *T*_{i}*T*_{i + 1}..*T*_{k} = *T*_{ik} marks the first tensor, and *T*_{k + 1}*T*_{k + 2}..*T*_{j} = *T*_{(k + 1)j} marks the second. When contracted, they result in tensor *T*_{ij}.

With this, we have all the information necessary to represent the resulting tensor:

```
tspace = Map.foldl (*) 1
contract :: (TensorData, TensorData) -> TensorData
contract (left,right) = TensorData {
totalCost = tspace (indices left <> indices right)
+ (totalCost left)
+ (totalCost right),
recipe = Intermediate (recipe left) (recipe right)
indices = symDiff (indices left) (indices right),
} where
symDiff l r = (l \\ r) <> (r \\ l) -- symmetric difference
```

However, calculating the symmetric difference as well as the union of the indices incident to participant tensors is 𝒪(*R*), where *R* is maximum order among tensors contracted. If we do this for every *T*_{ik} and *T*_{(k + 1)j}, we’ve increased the complexity of our algorithm to 𝒪(*R**N*^{3}).

Can we do better? Can we calculate cost and symmetric difference in a way that’s independent of `k`

to store in our memoization table for when it’s time to consider *T*_{ij}?

Let’s consider the symmetric difference, which is associative. If we have an intermediate tensor *T*_{ij}, no matter which value for `k`

we evaluate, the set of indices belonging to *T*_{ij} will always be the same. So, we don’t need to calculate this for every value of `k`

after all, we only need to reach into the memoization table to find one `k`

value that’s already been considered. Where can we find that `k`

? If we look back at the `triangle`

function, we receive a hint:

```
triangle :: (Int,Int) -> ListF (Int,Int) (Int,Int)
triangle (1,1) = Some (1,1)
triangle (i,j)
| i == j = Cons (i,j) (1,j-1)
| otherwise = Cons (i,j) (i+1,j)
```

for a given pair `(i,j)`

, we append that pair to a list containing `(i+1,j)`

at the head. So, by the time we reach `(i,j)`

in our fold within `findParen`

, `(i+1,j)`

is already in the memoization table. This means that a value `k = i`

has already been accounted for and we can access the information for *T*_{ii} and *T*_{(i + 1)j}, and calculate the symmetric difference, i.e. the indices leftover from the contraction:

```
indLeft = indices $ table ! (i,i)
indNext = indices $ table ! (i+1,j)
symdiff = (indLeft \\ indNext) <> (indNext \\ indLeft) -- O(R)
```

That’s one half of our problem solved. Unfortunately, the same intuition cannot be applied to finding each contraction cost, which is not associative. This is obvious when you consider for a moment that, if it were, this problem of finding the optimal parenthesization wouldn’t exist!

This doesn’t completely kill our chances of remaining independent of `k`

, however, it just means we can’t do so and continue thinking about cost in the same way. So, let’s think about it another way.

Described in Carving-width and contraction trees for tensor networks, a contraction tree is a data structure describing a particular set of tensor contraction orders. For example, take a look at the following parenthesization of our matrix chain and notice its mapping to a contraction tree:

If this chain was comprised of higher-order tensors, the concept would remain the same.

We can weight the arcs of this tree, where the weight corresponds to the symmetric difference of the indices incident to the tensors getting contracted. We start with the arcs incident to the leaf nodes, corresponding to the original tensors and their indices:

Similarly, we can weight each internal node, representing each intermediate tensor, with their cost of creation – the product of the set union of its protruding arcs:

If you look closely at each internal node, you’ll notice one interesting property:

For each node with arcs *a*, *a*′, and *a*″, the weight of the node itself, i.e. the cost of its creation, is:

We can reconsider the labels of our arcs in a way that captures the nature of the problem at hand:

With this we can see that the weights of arcs correspond to the set of indices child tensors *T*_{ik} and *T*_{(k + 1)j} and their result, *T*_{ij}, each contain. By maintaining an account of those arc weights, we can calculate the cost of the contraction in a way that relies on the associative symmetric difference. In other words, we can now calculate the cost in a way that’s independent of `k`

. First, we’ll add a new attribute called `cspace`

that captures this arc weight:

```
data TensorData = TensorData {
totalCost :: Int,
recipe :: ContractionTree Int,
cspace :: Int,
indices :: Tensor
}
```

which we can utilize in calculating the cost:

```
indLeft = indices $ table ! (i,i)
indNext = indices $ table ! (i+1,j)
symdiff = (indLeft \\ indNext) <> (indNext \\ indLeft) -- O(R)
cspaceij = tspace symdiff -- O(R)
-- get contraction data of combining two intermediate tensors: O(1)
contract :: (TensorData, TensorData) -> TensorData
contract (left,right) = TensorData {
totalCost = totalCost left + totalCost right + sqrtCspaces,
recipe = Intermediate (recipe left) (recipe right),
cspace = cspaceij,
indices = symdiff
} where
cspaces = cspace left * cspace right * cspaceij
sqrtCspaces = round . sqrt . fromIntegral $ cspaces
```

Putting everything together, we now have everything we need for an algorithm that can calculate the tensor chain product for any path in a tensor network in 𝒪(*R**N*^{2} + *N*^{3}) time. After addressing the portions of our memoization table that correspond to our original tensors, we have our final result:

```
chainT :: Vector Tensor -> TensorData
chainT tensors = best where
best = (hylo triangle findParen range) ! range
range = (1, length tensors)
emptyData i = TensorData {
totalCost = 0,
recipe = Tensor i,
cspace = tspace t,
indices = t
} where t = (V.!) tensors (i-1)
triangle :: (Int,Int) -> ListF (Int,Int) (Int,Int)
triangle (1,1) = Some (1,1)
triangle (i,j)
| i == j = Cons (i,j) (1,j-1)
| otherwise = Cons (i,j) (i+1,j)
findParen :: Algebra (ListF (Int,Int)) (Map (Int,Int) TensorData)
findParen (Some (t,_)) = Map.insert (t,t) (emptyData t) Map.empty -- O(R)
findParen (Cons (i,j) table) -- O(R + N) per (i,j)
| i == j = Map.insert (i,j) (emptyData i) table -- O(R)
| i < j = Map.insert (i,j) best table where
-- O(R)
indLeft = indices $ table ! (i,i)
indNext = indices $ table ! (i+1,j)
symdiff = (indLeft \\ indNext) <> (indNext \\ indLeft)
cspaceij = tspace symdiff
-- O(N)
splits = [((i,k),(k+1,j)) | k <- [i..j-1]]
getData (l,r) = (table ! l, table ! r)
parenthesizations = map (contract . getData) splits
best = argmin totalCost parenthesizations
-- O(1)
-- get contraction data of combining two intermediate tensors
contract :: (TensorData, TensorData) -> TensorData
contract (left,right) = TensorData {
totalCost = totalCost left + totalCost right + sqrtCspaces,
recipe = Intermediate (recipe left) (recipe right),
cspace = cspaceij,
indices = symdiff
} where
cspaces = cspace left * cspace right * cspaceij
sqrtCspaces = round . sqrt . fromIntegral $ cspaces
```

As a sanity check, we can evaluate the same matrix chain we started with:

]]>First blog post – yay! This post documents some of my experience getting practice with recursion schemes and some monadic computations in the context of prototyping a genetic algorithms library. For a full-fledged, flexible, genetic algorithms library written in Haskell, I refer the reader to moo.

Genetic algorithms are a type of heuristic in which candidate solutions to a problem are stochastically and incrementally evolved over time with the aim of producing performant ones; candidates, or individuals, are evolved with the help of genetic operators for selecting, manufacturing, and altering those individuals.

Let’s start by defining some of the context in which our genetic algorithm should run. It would be nice to reference a configuration containing all the definitions and parameters we could need (like mutation and selection methods), utilize and update a random number generator for generating and mutating individuals, and log intermediate data. The `RWS`

monad presents itself as a candidate for meeting these criteria, so let’s wrap it in a newtype:

With this definition (which requires `GeneralizedNewtypeDeriving`

), we can reference and update the `PureMT`

random number generator with `get`

and `put`

, refer to our configuration with `ask`

, and log intermediate data with `tell`

.

One of my favorite genetic algorithm libraries, deap, allows you to keep track of a hall of fame – a collection of the most-fit individuals. We can represent this collection as a continually-updated min-heap, where the worst-performing individuals at a particular point in time can be popped from the heap and discarded:

It would be helpful to have a means of tracking the best individuals over time, along with any other data that could be gathered with every new generation. For this, a snapshot data type:

Next we can define the data type containing all of our configuration parameters that we will then be able to reference in `GAContext`

computations:

```
data GAConfig i = Config {
-- the probability an individual is mutated
mutationRateInd :: Double
-- the probability a gene of an individual is mutated
, mutationRateGene :: Double
-- the percentage of the population that gets replaced through recombination
, crossoverRate :: Double
-- the population size
, popSize :: Int
-- the mutation method
, mutate :: i -> GAContext i i
-- the crossover method
, crossover :: i -> i -> GAContext i i
-- the method to create a new individual
, randomIndividual :: GAContext i i
-- the selection method
, selectionMethod :: Vector i -> GAContext i (Vector i)
-- the fitness function (higher fitness is preferred)
, fitness :: i -> Double
-- the number of generations
, numGenerations :: Int
-- the `hofSize` best individuals across all generations
, hofSize :: Int
-- function for information sourced from most recent snapshot
, logFunc :: GASnapshot i -> GAContext i ()
}
```

This configuration serves as the basic interface to the library. Once an instance of this data type is created, the genetic algorithm can do the bulk of its work.

The genetic algorithm will evolve our set of candidate solutions over time for a fixed number of steps, or generations.

Every generation of the genetic algorithm is determined by a `step`

function:

```
step :: Ord a => GASnapshot a -> GAContext a (GASnapshot a)
step (Snapshot lastGen hof genNumber) = do
Config {hofSize, logFunc, popSize, selectionMethod} <- ask
-- select parents and create the next generation from them
selectedParents <- selectionMethod lastGen
-- use the set of parents to create and mutate a new generation
children <- crossAndMutate selectedParents popSize
-- update the HOF
updatedHOF <- updateHOF hof children hofSize
-- construct the new snapshot
let nextSnapshot = Snapshot{
lastGeneration = children,
hof = updatedHOF,
generationNumber = genNumber + 1
}
-- log intermediate results
logFunc nextSnapshot
-- return the mutated generation
return nextSnapshot
```

The `step`

function takes the current snapshot, along with the user-defined configuration to select a portion of the population to pass genetic material, crossover individuals from that subset to generate children, and mutate a portion of those children. With every pass, the Hall of Fame is updated with better-fit individuals, if they are found, and the subsequent snapshot is returned.

After parents are selected with the user-defined `selectionMethod`

, the `Vector`

of parents act as a seed from which children are produced. The generation of these children via crossover and their mutation are done in the same pass with a hylomorphism:

At this point, I refer the reader to the existing (and superior) resources on recursion schemes, if they are unfamiliar with the concept; I found Awesome Recursion Schemes to be helpful, particularly Patrick Thompson’s series and Jared Tobin’s blog posts.

Briefly, and skipping over useful generalizations provided by the `recursion-schemes`

library: catamorphisms tear down structures, anamorphisms construct structures, and hylomorphisms are the composition of an anamorphism and a catamorphism, i.e. the construction and tearing-down of an intermediate structure. Catamorphisms utilize a function to tear down their structures while anamorphisms utilize a function to build up their structures. Both functions can be found within `Control.Functor.Algebra`

and are representations of the morphisms that each comprise a third of an F-Algebra and F-CoAlgebra respectively:

Normal hylomorphisms have the type:

For our case, the monadic context of `GAContext`

needs to be preserved. The haskell package `data-fix`

offers the `hyloM`

function, which instead relies on the monadic `AlgebraM`

and `CoAlgebraM`

types:

With the monadic hylomorphism in `crossAndMutate`

above, a fixed list of mutated children is unfolded from a seed using `newChild`

and folded into a vector of the same type using `toVector`

. This yields the next generation of candidate solutions for the genetic algorithm.

Let’s take a look at the `newChild`

function:

```
-- selects two parents to breed, a child is born, joy to the world
newChild :: (Vector a) -> CoAlgebraM (GAContext a) (ListF a) Int
newChild parents 0 = return Nil
newChild parents m = do
-- get mutation and crossover methods
Config {crossover, mutate} <- ask
-- get two random indices
i <- randomI
j <- randomI
-- from the two indices, grab two parents
let p1 = parents ! (i `mod` (length parents))
let p2 = parents ! (j `mod` (length parents))
-- make a child
child <- crossover p1 p2
-- mutate the child
mutatedChild <- mutate child
-- add the child to the collection
return $ Cons mutatedChild (m-1)
```

`newChild`

generates a new individual with the user-defined `crossover`

function from two parents chosen at random from the group individuals selected to pass on their genetic material. We then apply the user-defined `mutate`

function to the child and append that mutated individual to the in-progress collection of children. This is the anamorphic half of the hylomorphism.

The catamorphic half of the transformation is accomplished with `toVector`

below:

and we can see that it is rather straightfoward, once we make a `Corecursive`

instance of `Vector`

to leverage the `embed`

function:

In addition to the above instance, we will find later on, with our use of `cata`

that defining a `Recursive`

instance of `Vector`

is also necessary:

Once the collection of mutated children has been returned by `crossAndMutate`

, we will want to update the Hall of Fame with any individuals that perform better than the extant individuals therein. Let’s create a function that will take a vector of individuals and insert them all into the heap representing the Hall of Fame:

Simple enough. Our catamorphism breaks down our `Vector`

into a `HOF`

; all it needs is the existing one into which we can insert the elements.

With this, we can update the current `HOF`

by dumping the latest population into it and popping off minimally-fit individuals until the HOF is back at its original size.

```
-- updates the HOF by removing the worst-fit individuals from the min-heap
updateHOF :: Ord a => HOF a -> Vector a -> Int -> GAContext a (HOF a)
updateHOF hof pop hofSize = return . Heap.drop n $ oversizedHOF where
-- insert all of the current population
oversizedHOF = insertHeap hof pop
-- drop all but hofSize individuals
n = V.length pop - if Heap.isEmpty hof then hofSize else 0
```

Now that we have outlined the flow of the genetic algorithm, we need to provide an initial population. For this, we leverage the user-defined `randomIndividual`

function, provided within the ever-present `GAConfig`

:

```
-- creates a vector of random individuals
makePopulation :: Int -> GAContext a (Vector a)
makePopulation s = hyloM toVector addRandomInd s where
-- creates a random individual and adds it to the collection
addRandomInd :: CoAlgebraM (GAContext a) (ListF a) Int
addRandomInd 0 = return Nil
addRandomInd n = do
-- get a new, random individual
ind <- randomIndividual =<< ask
-- add it to the collection
return $ Cons ind (n-1)
```

We now have all pieces necessary for running the genetic algorithm for one complete generation. After some initial setup, we can run for the user-specified number of generations:

```
runGA :: Ord a => GAContext a (GASnapshot a)
runGA = do
Config {numGenerations, popSize, hofSize} <- ask
-- initialize the population
initialPop <- makePopulation popSize
-- set up the initial Hall of Fame
initialHOF <- updateHOF (Heap.empty :: HOF a) initialPop hofSize
-- set up the initial snapshot
let snapshot = Snapshot {
lastGeneration = initialPop,
hof = initialHOF,
generationNumber = 0
}
-- run the genetic algorithm
runN numGenerations step snapshot
```

Using our configuration parameters we create an initial snapshot and pass that to a function that runs the step function for a set number of iterations equal to the number of generations. Let’s take a look at the definition of `runN`

:

it takes a function (in our case `step`

) and applies that function `n`

times, returning the final result.

Finally, we can run the `GAContext`

, a newtype wrapper for the `RWS`

monad, with `runRWS`

and `evalRWS`

:

```
-- from a new rng, run the genetic algorithm
evalGA :: Ord i => GAConfig i -> IO (GASnapshot i, [T.Text])
evalGA cfg = newPureMT >>= (return . evalGASeed cfg)
-- from a user-supplied rng, run the genetic algorithm
evalGASeed :: Ord i => GAConfig i -> PureMT -> (GASnapshot i, [T.Text])
evalGASeed cfg rng = evalRWS (ctx runGA) cfg rng
-- from a user-supplied rng, run the genetic algorithm and return the updated seed
runGASeed :: Ord i => GAConfig i -> PureMT -> (GASnapshot i, PureMT, [T.Text])
runGASeed cfg rng = runRWS (ctx runGA) cfg rng
```

With this, all the user needs to do is define their genetic operators and fitness functions for their own individual representation, and they should be able to call one of these three functions to run the genetic algorithm.

Let’s see an example of it in action with a very simple problem: maximizing the number of 1’s in a 500-bit binary string. Source can be found in `BinaryInd.hs`

.

We’ll represent the binary string as a list of `Bool`

:

We can start simply by defining the fitness function for this individual representation, which is just the number of `True`

booleans in the list:

Next we can define a function to create a new and random bit string of length 500:

We’ll also need to provide a way to mutate our individual:

```
-- mutate a binary string representation
mutate :: BinaryInd -> GAContext BinaryInd BinaryInd
mutate ind@(BI bs) = do
-- grab individual and gene mutation rates
Config{mutationRateGene, mutationRateInd} <- ask
-- get a random double
indp <- randomD
-- if the value is less than mutation rate for an individual
if indp < mutationRateInd then
-- mutate each bit with `mutationRateGene` probability
fmap BI $ mapM (mutateBool mutationRateGene) bs
else
-- return the unaltered individual
return ind
-- mutate a boolean by flipping it
mutateBool :: Double -> Bool -> GAContext a Bool
mutateBool p x = do
-- get a random double
indp <- randomD
-- determine whether or not to flip the bit
return $ if indp < p then not x else x
```

In `mutate`

, we get a random double with a helper function `randomD`

and decide whether the given individual is to be mutated at all. If it is to be mutated, iterate over the given individual and determine whether the genes themselves (the bits) should be mutated with some given probability with `mutateBool`

.

To cross two parents, we’ll generate a bitmask that will inform us whether a given gene should be taken from the first parent or the second parent:

```
-- recombine two individuals from the population
crossover :: BinaryInd -> BinaryInd -> GAContext BinaryInd BinaryInd
crossover (BI i1) (BI i2) = do
-- get the crossover rate
Config{crossoverRate} <- ask
-- get a random double
indp <- randomD
if indp < crossoverRate then do -- perform crossover
-- get booleans specifying which gene to take
code <- replicateM (length i1) randomBool
-- choose genetic material from first or second parent
let eitherOr = (\takeThis this that -> if takeThis then this else that)
-- perform uniform crossover
return . BI $ zipWith3 eitherOr code i1 i2
else do
-- choose the genetic material from one of the parents
chooseFirstParent <- randomBool
return . BI $ if chooseFirstParent then i1 else i2
```

This type of crossover is called uniform crossover.

Our selection scheme is simple: take the best 20% of the population:

```
select :: Ord a => Vector a -> GAContext a (Vector a)
select pop = do
-- get the population size
Config{popSize} <- ask
-- get the number of individuals to breed
let numToSelect = round $ 0.2 * (fromIntegral popSize)
-- get the top 20% of the best-performing individuals
let selectedParents = V.take numToSelect . V.reverse $ V.modify sort pop
return selectedParents
```

Almost there! It’s time to run the genetic algorithm in our main function by instantiating a `GAConfig`

with the functions we’ve defined:

```
import qualified BinaryInd as BI
main :: IO ()
main = do
let cfg = Config {
mutationRateInd = 0.8
, mutationRateGene = 0.02
, crossoverRate = 0.7
, popSize = 100
, mutate = BI.mutate
, crossover = BI.crossover
, randomIndividual = BI.new
, selectionMethod = BI.select
, fitness = BI.score
, numGenerations = 200
, hofSize = 1
, logFunc = logHOF
}
-- run the genetic algorithm
(finalSnapshot, progress) <- evalGA cfg
-- output the best fitnesses as they're found
mapM_ (putStrLn . T.unpack) progress
```

We call the `evalGA`

function on our configuration to yield the final snapshot containing the hof. We can log the progress of the genetic algorithm by printing the logging messages written with `tell`

and `logFunc`

.

The `logHOF`

function puts the scores of the `HOF`

into CSV format for easy graphing:

```
logHOF :: Ord a => GASnapshot a -> GAContext a ()
logHOF Snapshot{hof, generationNumber} = do
-- get the fitness function
Config {fitness} <- ask
-- get string representations of the best individuals
let best = map (T.pack . show . fitness) $ Heap.toList hof
-- craft the comma-separated line
let msg = [T.concat $ intersperse (T.pack ",") best]
-- log the line
tell msg
```

And here we can see how the GA improves fitness across generations:

We can see that the GA is does pretty well for our little problem, making it most of the way towards an optimal solution within the first 100 generations. Not bad!

We’ve prototyped a library that can allow us to see if our given (and contrived) problem could stand to benefit from a genetic algorithm. I realize I’ve glossed over some details here, such as the `randomD`

and `randomBool`

definitions; if you want code that compiles, you’ll need to consult the source.

I also briefly mentioned the resources for recursion schemes, but if you’d like more examples (namely with `cata`

, `cataM`

, and `anaM`

) I’ve created a recursion-scheme-based analogue to `BinaryInd`

in `BinaryIndRec.hs`

.