Discovering I've Shot Myself in the Foot with std::async
I was recently looking into implementing some
speculative execution functionality using std::async
; based on some seed data,
the goal was to asynchronously run a compute-intensive calculation
where the result may or may not be needed at a later time. When new seed
data became available, it was an indication that any
in-progress computation should be terminated early and that
its result be considered defunct.
Below I’m going to outline a contrived example that demonstrates the sort of trouble I ended up running into.
enum class CalculationProgress { Unaborted, Aborted };
std::optional<int>
(std::string name,
calculatestd::shared_ptr<std::atomic<CalculationProgress>> abort_status) {
const auto arbitrary_computation_time = std::chrono::seconds(10);
const auto arbitrary_loop_time = std::chrono::seconds(5);
const auto arbitrary_max_loops =
/ arbitrary_loop_time;
arbitrary_computation_time auto count = 0;
bool was_aborted = false;
do {
std::this_thread::sleep_for(arbitrary_loop_time);
++;
count= *abort_status == CalculationProgress::Aborted;
was_aborted } while (count < arbitrary_max_loops && !was_aborted);
const auto status = was_aborted ? "ABORTED" : "COMPLETE";
std::cout << name << ": " << status << ", count: " << count << std::endl;
return was_aborted ? std::nullopt : std::optional(count);
}
The function calculate
above is the contrived, compute-intensive calculation. It periodically checks the condition while performing its calculation (in this case every five seconds). Depending on its value,
it either finishes its long-running calculation or gets aborted – whichever occurs first. From arbitrary_loop_time
in the do-while loop, it’s clear that calculate
will have a best-case execution time of five seconds.
The entity making async calls to calculate
is a class called Calculator
, and it
tracks the results of calculate
in one of its member attributes.
struct Calculator {
std::future<std::optional<int>> future_value{};
void launch_async_calc(
std::string name,
std::shared_ptr<std::atomic<CalculationProgress>> abort_status) {
=
future_value std::async(std::launch::async, [=]() { return calculate(name, abort_status); });
}
};
With these in place, async computations can be launched and aborted:
int main() {
{};
Calculator calculator
auto cv1 = std::make_shared<std::atomic<CalculationProgress>>(
::Unaborted);
CalculationProgress.launch_async_calc("foo", cv1);
calculator
auto cv2 = std::make_shared<std::atomic<CalculationProgress>>(
::Unaborted);
CalculationProgress.launch_async_calc("bar", cv2);
calculator
*cv1 = CalculationProgress::Aborted;
*cv2 = CalculationProgress::Aborted;
return 0;
}
When running this program, we get the following output:
$ time ./calculate_foo
foo: COMPLETE, count: 2
bar: ABORTED, count: 2
./calculate_foo 0.00s user 0.00s system 0% cpu 10.002 total
Something is off – wasn’t foo
supposed to be aborted? And the program runs for about
ten seconds. I’d expect both launch_async_calc
calls to run in parallel on
my machine; foo
would recognize its toggled abort after one loop
of five seconds, bar
would would do the same, and the program time
should then be five seconds or so. What happened?
Using some good, old-fashioned print debugging, let’s check to see if
these processes are launched concurrently by adding the following
to the top of calculate
:
std::optional<int>
(std::string name,
calculatestd::shared_ptr<std::atomic<CalculationProgress>> abort_status) {
std::cout << name << ": LAUNCHED" << std::endl;
...
}
Now the program outputs:
$ time ./calculate_foo
foo: LAUNCHED
bar: LAUNCHED
foo: COMPLETE, count: 2
bar: ABORTED, count: 2
./calculate_foo 0.00s user 0.00s system 0% cpu 10.002 total
So it does look like they’re getting launched concurrently. From the
ten-second run time and loop count, it would appear that foo
wasn’t appropriately
aborted as we would suspect. Adding one more log line to main
:
int main() {
...
std::cout << "ABORTING" << std::endl;
*cv1 = CalculationProgress::Aborted;
...
}
we see:
$ time ./calculate_foo
foo: LAUNCHED
bar: LAUNCHED
foo: COMPLETE, count: 2
ABORTING
bar: ABORTED, count: 2
./calculate_foo 0.00s user 0.00s system 0% cpu 10.002 total
Okay, so foo
has completely finished its calculation before it’s
been properly aborted, despite the fact each calculation is launched
concurrently. It would appear something is blocking on the calculation
for foo
.
After a few more careful placements, the culprit is narrowed
down to the only line in calculator.launch_async_calc("bar", cv2)
:
=
future_value std::async(std::launch::async, [=]() { return calculate(name, abort_status); });
After measuring the time it takes to execute this line, the program
shows that it takes a whole nine seconds to reassign
the std::future
returned by the std::async
call.
$ time ./calculate_foo
foo: assigned in 0 seconds
foo: LAUNCHED
bar: LAUNCHED
bar: ABORTED, count: 0
foo: COMPLETE, count: 2
bar: assigned in 9 seconds
ABORTING
./calculate_foo 0.00s user 0.00s system 0% cpu 10.002 total
When dealing with this problem originally, outside of this contrived example, I started
completely puzzled. After narrowing down to two possibilities, the std::future
move assignment
operator or its destructor, I decided to search online and found a page that resulted in palm-to-face contact…
Why is the destructor of a future returned from std::async
blocking?
After the initial surprise wore off, and after reading through a portion of the treasure-trove of
information linked by the top answer, I decided to see if the
official documentation for std::future
had anything to say about this. More or less, about halfway down the page, it does:
If the std::future obtained from std::async is not moved from or bound to a reference, the destructor of the std::future will block at the end of the full expression until the asynchronous operation completes, essentially making code such as the following synchronous:
std::async(std::launch::async, []{ f(); }); // temporary's dtor waits for f()
std::async(std::launch::async, []{ g(); }); // does not start until f() completes
(note that the destructors of std::futures obtained by means other than a call to std::async never block)
I’m not completely satisfied with lack of emphasis on the reference page, but it at least alludes to one possible way of making the above program work as intended. All it takes is a few lines. We maintain a vector of past calculations and move the future that’s about to get reassigned into the vector before doing so:
struct Calculator {
std::future<std::optional<int>> future_value{};
std::vector<std::future<std::optional<int>>> old_futures{}; // new
void launch_async_calc(std::string name,
std::shared_ptr<std::atomic<CalculationProgress>> abort_status) {
.push_back(std::move(future_value)); // new
old_futures=
future_value std::async(std::launch::async, [=]() { return calculate(name, abort_status); });
}
};
This solution has its problems. For example, now there arguably
should be something that prunes old_futures
, once they complete,
in order to avoid a vector that perpetually grows. The program now outputs:
$ time ./calculate_foo
foo: assigned in 0 seconds
foo: LAUNCHED
bar: assigned in 0 seconds
bar: LAUNCHED
ABORTING
foo: ABORTED, count: 1
bar: ABORTED, count: 1
./calculate_foo 0.00s user 0.00s system 0% cpu 5.002 total
The total run time of the program is still limited to the minumum run time of calculating
foo
, because each element in old_futures
still has a blocking destructor, but at least
subsequent calls to launch_async_calc
will not be blocking on prior calls and
the program now behaves as we would expect.
I’m on the fence about considering this a true foot-gun or not. But
something that does make it seem like one, at least to me, is the
fact that std::futures
returned by std::promise
, for example,
do not exhibit this blocking behavior. Either way, having been made aware of
this, I’ll be keeping it in mind.