Discovering I've Shot Myself in the Foot with std::async
I was recently looking into implementing some speculative execution functionality using std::async
; based on some seed data, the goal was to asynchronously run a compute-intensive calculation where the result may or may not be needed at a later time. When new seed data became available, it was an indication that any in-progress computation should be terminated early and that its result be considered defunct.
Below I’m going to outline a contrived example that demonstrates the sort of trouble I ended up running into.
enum class CalculationProgress { Unaborted, Aborted };
std::optional<int>
calculate(std::string name,
std::shared_ptr<std::atomic<CalculationProgress>> abort_status) {
const auto arbitrary_computation_time = std::chrono::seconds(10);
const auto arbitrary_loop_time = std::chrono::seconds(5);
const auto arbitrary_max_loops =
arbitrary_computation_time / arbitrary_loop_time;
auto count = 0;
bool was_aborted = false;
do {
std::this_thread::sleep_for(arbitrary_loop_time);
count++;
was_aborted = *abort_status == CalculationProgress::Aborted;
} while (count < arbitrary_max_loops && !was_aborted);
const auto status = was_aborted ? "ABORTED" : "COMPLETE";
std::cout << name << ": " << status << ", count: " << count << std::endl;
return was_aborted ? std::nullopt : std::optional(count);
}
The function calculate
above is the contrived, compute-intensive calculation. It periodically checks the condition while performing its calculation (in this case every five seconds). Depending on its value, it either finishes its long-running calculation or gets aborted – whichever occurs first. From arbitrary_loop_time
in the do-while loop, it’s clear that calculate
will have a best-case execution time of five seconds.
The entity making async calls to calculate
is a class called Calculator
, and it tracks the results of calculate
in one of its member attributes.
struct Calculator {
std::future<std::optional<int>> future_value{};
void launch_async_calc(
std::string name,
std::shared_ptr<std::atomic<CalculationProgress>> abort_status) {
future_value =
std::async(std::launch::async, [=]() { return calculate(name, abort_status); });
}
};
With these in place, async computations can be launched and aborted:
int main() {
Calculator calculator{};
auto cv1 = std::make_shared<std::atomic<CalculationProgress>>(
CalculationProgress::Unaborted);
calculator.launch_async_calc("foo", cv1);
auto cv2 = std::make_shared<std::atomic<CalculationProgress>>(
CalculationProgress::Unaborted);
calculator.launch_async_calc("bar", cv2);
*cv1 = CalculationProgress::Aborted;
*cv2 = CalculationProgress::Aborted;
return 0;
}
When running this program, we get the following output:
$ time ./calculate_foo
foo: COMPLETE, count: 2
bar: ABORTED, count: 2
./calculate_foo 0.00s user 0.00s system 0% cpu 10.002 total
Something is off – wasn’t foo
supposed to be aborted? And the program runs for about ten seconds. I’d expect both launch_async_calc
calls to run in parallel on my machine; foo
would recognize its toggled abort after one loop of five seconds, bar
would would do the same, and the program time should then be five seconds or so. What happened?
Using some good, old-fashioned print debugging, let’s check to see if these processes are launched concurrently by adding the following to the top of calculate
:
std::optional<int>
calculate(std::string name,
std::shared_ptr<std::atomic<CalculationProgress>> abort_status) {
std::cout << name << ": LAUNCHED" << std::endl;
...
}
Now the program outputs:
$ time ./calculate_foo
foo: LAUNCHED
bar: LAUNCHED
foo: COMPLETE, count: 2
bar: ABORTED, count: 2
./calculate_foo 0.00s user 0.00s system 0% cpu 10.002 total
So it does look like they’re getting launched concurrently. From the ten-second run time and loop count, it would appear that foo
wasn’t appropriately aborted as we would suspect. Adding one more log line to main
:
we see:
$ time ./calculate_foo
foo: LAUNCHED
bar: LAUNCHED
foo: COMPLETE, count: 2
ABORTING
bar: ABORTED, count: 2
./calculate_foo 0.00s user 0.00s system 0% cpu 10.002 total
Okay, so foo
has completely finished its calculation before it’s been properly aborted, despite the fact each calculation is launched concurrently. It would appear something is blocking on the calculation for foo
.
After a few more careful placements, the culprit is narrowed down to the only line in calculator.launch_async_calc("bar", cv2)
:
After measuring the time it takes to execute this line, the program shows that it takes a whole nine seconds to reassign the std::future
returned by the std::async
call.
$ time ./calculate_foo
foo: assigned in 0 seconds
foo: LAUNCHED
bar: LAUNCHED
bar: ABORTED, count: 0
foo: COMPLETE, count: 2
bar: assigned in 9 seconds
ABORTING
./calculate_foo 0.00s user 0.00s system 0% cpu 10.002 total
When dealing with this problem originally, outside of this contrived example, I started completely puzzled. After narrowing down to two possibilities, the std::future
move assignment operator or its destructor, I decided to search online and found a page that resulted in palm-to-face contact…
Why is the destructor of a future returned from std::async
blocking?
After the initial surprise wore off, and after reading through a portion of the treasure-trove of information linked by the top answer, I decided to see if the official documentation for std::future
had anything to say about this. More or less, about halfway down the page, it does:
If the std::future obtained from std::async is not moved from or bound to a reference, the destructor of the std::future will block at the end of the full expression until the asynchronous operation completes, essentially making code such as the following synchronous:
std::async(std::launch::async, []{ f(); }); // temporary's dtor waits for f()
std::async(std::launch::async, []{ g(); }); // does not start until f() completes
(note that the destructors of std::futures obtained by means other than a call to std::async never block)
I’m not completely satisfied with lack of emphasis on the reference page, but it at least alludes to one possible way of making the above program work as intended. All it takes is a few lines. We maintain a vector of past calculations and move the future that’s about to get reassigned into the vector before doing so:
struct Calculator {
std::future<std::optional<int>> future_value{};
std::vector<std::future<std::optional<int>>> old_futures{}; // new
void launch_async_calc(std::string name,
std::shared_ptr<std::atomic<CalculationProgress>> abort_status) {
old_futures.push_back(std::move(future_value)); // new
future_value =
std::async(std::launch::async, [=]() { return calculate(name, abort_status); });
}
};
This solution has its problems. For example, now there arguably should be something that prunes old_futures
, once they complete, in order to avoid a vector that perpetually grows. The program now outputs:
$ time ./calculate_foo
foo: assigned in 0 seconds
foo: LAUNCHED
bar: assigned in 0 seconds
bar: LAUNCHED
ABORTING
foo: ABORTED, count: 1
bar: ABORTED, count: 1
./calculate_foo 0.00s user 0.00s system 0% cpu 5.002 total
The total run time of the program is still limited to the minumum run time of calculating foo
, because each element in old_futures
still has a blocking destructor, but at least subsequent calls to launch_async_calc
will not be blocking on prior calls and the program now behaves as we would expect.
I’m on the fence about considering this a true foot-gun or not. But something that does make it seem like one, at least to me, is the fact that std::futures
returned by std::promise
, for example, do not exhibit this blocking behavior. Either way, having been made aware of this, I’ll be keeping it in mind.