24 Hard Rules for Writing Correct Async C++

Async C++ will let you write a use-after-free that only manifests under load, on the third Tuesday of the month, in a stack frame that has nothing to do with the bug. The compiler won’t warn you. Your tests will pass. Your sanitizers will shrug. And then production will teach you what you missed.

I maintain a ~50K LOC C++20 service built on Seastar. I catalogued every class of bug that burned me and turned them into 24 rules I enforce on every commit. Each one cost at least a day to diagnose. Here they are.

Memory That Isn’t Yours Anymore

Most of the worst async C++ bugs are lifetime bugs. In synchronous code, if the object exists, the scope that created it still exists. In async code, that’s not true. The object is alive but the scope that created it finished long ago.

Rule 16 — Lambda coroutines in .then() are use-after-free. This is the scariest bug on the list because it looks completely correct. You write a lambda that contains co_await, pass it to .then(), and everything compiles. Here’s what actually happens: .then() moves the lambda into internal storage. The lambda’s operator() is called, which creates a coroutine frame on the heap. The coroutine suspends at co_await. .then() is done with the lambda and destroys it. The coroutine resumes into freed memory.

The broken version:

// BROKEN — use-after-free when the coroutine suspends
future<> handle(request req) {
    return async_lookup(req.key()).then([req = std::move(req)](auto val) -> future<> {
        co_await async_log(req, val);   // .then() has already freed this lambda
        co_return;
    });
}

The fix:

// FIXED — seastar::coroutine::lambda() keeps the frame alive
future<> handle(request req) {
    return async_lookup(req.key()).then(seastar::coroutine::lambda([req = std::move(req)](auto val) -> future<> {
        co_await async_log(req, val);
        co_return;
    }));
}

The compiler will never warn you about this. I found it after three days of chasing a heap corruption that only appeared under sustained load.

Rule 5 — Timer callbacks need gate guards. A repeating timer fires after stop() has already begun destroying this. The callback dereferences member variables that no longer exist. The fix is seastar::gate, but the gate holder must outlive the entire async operation, not just the try block.

// BROKEN — gate guard scoped to try body; catch runs outside the gate
void on_timer() {
    try {
        auto holder = _gate.hold();
        co_await do_work();
    } catch (...) {
        _logger.warn("failed");  // _logger is destroyed during shutdown
    }
}

// FIXED — gate guard covers the entire operation including error handling
void on_timer() {
    auto holder = _gate.hold();
    try {
        co_await do_work();
    } catch (...) {
        _logger.warn("failed");
    }
}

During shutdown, _gate.close() waits for outstanding holders. If the holder is scoped inside the try, the catch path runs unguarded, touching members that stop() has already destroyed.

Rule 21 — Coroutine reference parameters dangle. Just take coroutine parameters by value. Always. A coroutine that takes const std::string& looks correct, compiles fine, passes every unit test, and breaks under load. The caller’s string goes out of scope, the coroutine suspends, and when it resumes the reference points to freed memory.

// BROKEN — reference dangles when caller's scope ends before coroutine resumes
future<> process(const std::string& key) {
    co_await db.lookup(key);  // key may be freed by now
}

// FIXED — take by value, always
future<> process(std::string key) {
    co_await db.lookup(key);
}

The cost of a copy is nothing compared to debugging a dangling reference that only shows up at the 99.9th percentile.

Rule 20 — Missing & in do_with lambdas. seastar::do_with allocates objects on the heap and passes them by reference to your lambda. Forget a single & and you get a copy instead.

// BROKEN — buf is captured by value; the copy dies when the lambda returns
return seastar::do_with(std::move(buf), [](auto buf) {
    return async_write(buf);  // dangling reference to destroyed copy
});

// FIXED — capture by reference; do_with owns the object for the future's lifetime
return seastar::do_with(std::move(buf), [](auto& buf) {
    return async_write(buf);
});

One missing character. The copy is destroyed when the lambda returns, but the future it spawned is still running, still holding a reference to the now-dead copy. Heap corruption that shows up in completely unrelated code, sometimes minutes later.

Two ways to keep this from biting you in the first place. If you control the type, make it move-only — the broken version won’t compile. And in C++20 coroutine style you rarely need do_with at all; a local variable in a coroutine lives across co_await suspensions, which is the whole point.

Rule 23 — share() on temporary_buffer pins the whole allocation. You call .share() to grab a 32-byte header from a 64KB network buffer. Both shared views now pin the same underlying allocation. You cache the header, but the “temporary” buffer lives forever. The result is unexplained memory growth that doesn’t correlate with logical data sizes. The fix: copy the bytes you need into a new buffer, then release the shared view.

The Reactor Is Not a Thread

Seastar is cooperative. There is no kernel to preempt you. Every microsecond you block is a microsecond where that core serves zero requests.

Rule 2 — No co_await in unbounded loops over external resources. The pattern for (auto& item : items) { co_await process(item); } is O(n) latency for that request. 100 items at 10ms each means a full second before the caller hears back. The shard keeps serving other connections in the meantime — it isn’t idle — but this one request is needlessly serialized. The point of async is that you get to pick the execution shape: serial, fully concurrent, or bounded concurrency. Pick deliberately. When you want parallelism, use seastar::parallel_for_each or seastar::max_concurrent_for_each with a cap so you don’t exhaust memory or downstream connections.

Rule 12 — No std::ifstream in coroutines. It compiles. It works in testing with SSDs. In production, one 10ms disk stall freezes the entire shard. Every connection on that core drops packets for 10ms. Use Seastar’s file I/O, which goes through the reactor and yields properly. And don’t reach for seastar::thread as an escape hatch — it’s a stackful coroutine that runs on the reactor, so blocking inside it stalls the shard exactly the same way. If you genuinely have no async-friendly alternative, run the blocking call on a dedicated OS thread outside the reactor and ferry the result back through the alien-thread API.

Rule 17 — Preemption points in hot loops. A tight loop that runs for 500μs without yielding starves everything else on that core. Insert co_await seastar::coroutine::maybe_yield() every ~100 iterations. The cost is a branch that’s almost never taken. The cost of not doing it is a reactor stall warning in your logs and a mystery latency spike that disappears when you reduce load.

Cross-Shard Is Cross-Universe

Each core in Seastar has its own memory allocator. This isn’t an implementation detail you can ignore. It’s a load-bearing invariant, and violating it corrupts allocator state silently.

Rule 0 — std::shared_ptr destructs on the wrong shard. The refcount is atomic, so the decrement is “safe” from any core. But the destructor runs on whichever core decrements last. That destructor frees memory through the wrong core’s allocator.

// BROKEN — destructor runs on whichever shard releases last
std::shared_ptr<session> s = std::make_shared<session>();
// ... shared across shards via submit_to() ...
// shard 3 drops the last reference; ~session() frees memory
// allocated by shard 0's allocator. Silent corruption.

// FIXED — foreign_ptr ensures destruction on the owning shard
seastar::foreign_ptr<seastar::lw_shared_ptr<session>> s;

Use seastar::lw_shared_ptr (non-atomic refcount, shard-local only) for objects that stay on one shard. Wrap cross-shard pointers in seastar::foreign_ptr, which ensures the destructor runs on the owning shard. This was the first bug that burned me and the last one I expected, which is why it’s Rule 0.

Rule 14 — Cross-shard heap data must be reallocated locally. You submit_to() another shard with a std::string. The target shard reads memory allocated by the source shard’s allocator. Maybe it works today. Maybe the allocator metadata is adjacent and you corrupt it on the next allocation. Copy on receive. Always. It feels wasteful but it prevents silent corruption.

Rule 15 — FFI across shard boundaries needs reallocation in both directions. Passing Seastar-allocated memory to an FFI boundary (Rust, C libraries) means the foreign code may free or reallocate through a different allocator. Reallocate into standard malloc memory before calling FFI. Reallocate the result back into Seastar’s allocator before returning to the reactor.

Futures Are Not Exceptions

C++ effectively has two error propagation systems now: exceptions and future chains. Code that mixes them has gaps where errors fall through.

Rule 18 — Discarded futures silently swallow errors. Calling an async function without co_await means the returned future is destroyed immediately. If that future eventually resolves with an exception, nobody sees it. Seastar logs a warning at runtime, but by then the damage is done: a write that didn’t complete, a cleanup that never ran. Every future must be co_awaited, returned, or explicitly discarded with a comment explaining why.

Rule 22 — Throwing before returning a future bypasses .finally(). If an exception is thrown synchronously before the function returns a future, it propagates as a regular C++ exception. Any .finally() attached to the expected return value never executes. Cleanup is skipped. Resources leak. Use seastar::futurize_invoke() to wrap the call, which catches synchronous exceptions and converts them into failed futures. Or just use coroutines, which handle this naturally.

Rule 19 — Raw semaphore::wait()/signal() leaks units on throw. You call wait(), do work, call signal() in a .finally(). But if the work throws synchronously before you attach .finally(), the units are never returned. The semaphore’s available count decreases monotonically until everything deadlocks. Use seastar::with_semaphore(), which handles the lifecycle correctly regardless of how the operation fails.

The Rules I Didn’t Expect

Some rules aren’t about the language at all.

Rule 4 — Every growing container needs MAX_SIZE. No unbounded buffers, ever. A single malicious peer sending oversized messages will OOM your process if nothing caps the queue. Every std::vector and std::deque, every ring buffer gets a configured maximum.

Rule 9 — Every catch block logs at warn level. A silent catch(...) is the number one cause of “it works but something is wrong” in production. If you’re catching an exception, something unexpected happened. Log it. If it’s too noisy, fix the root cause instead of silencing the symptom.

Rule 7 — Persistence only stores, never validates. This is a design rule, not a language rule. When the persistence layer also validates, you can’t test business logic without spinning up storage. When it only stores, you can test validation in isolation and reason about correctness without thinking about I/O.

The Remaining Rules

For completeness, here are the rules not covered in full above:

Rule 1 — Metrics accessors must be lock-free, no std::mutex in query methods.
Rule 3 — Null-guard all C string returns. sqlite3_column_text() returns NULL on empty columns; dereferencing it is undefined behavior.
Rule 6 — Deregister metrics first in stop(). Prometheus scrape lambdas capture this; if this is destroyed first, the next scrape is a use-after-free.
Rule 8 — Single ShardLocalState struct per service, no scattered thread_local variables.
Rule 10 — Validating helpers for string-to-number conversions. std::stoi() throws on bad input; raw calls in request parsing are a crash waiting to happen.
Rule 11 — std::call_once or std::atomic for one-time global initialization, never a bare static with lazy init.
Rule 13 — Thread-local new needs an explicit destroy function registered with the allocator, or the memory leaks on shard shutdown. (Switching to std::make_unique doesn’t fix this if the smart pointer itself has thread_local storage — the hazard is the destruction order at shard teardown, not the allocation syntax.)

How I Enforce Them

These rules live in a reference document I consult on every commit. They’re enforced by discipline, not tooling. No linter can tell you that a lambda coroutine in .then() is a use-after-free.

Numbering them matters. “Rule 16” is a faster shorthand than re-deriving the coroutine frame lifetime problem each time you encounter it.

The list started at Rule 0 and grew to 24. I add a rule only when a bug burns me. Never speculatively. If you’re building something similar, start your own list. The specific rules matter less than the habit of writing them down.

The Takeaway

Async C++ gives you performance that no garbage-collected language can match, but it takes away the safety nets. You have to build your own. Write your rules down.

I’m building Ranvier, a Layer 7 load balancer for LLM inference on Seastar. If this kind of systems work interests you, check out the source.

Updated 2026-05-04 with corrections from reader feedback on r/cpp: Rule 12 no longer recommends seastar::thread for blocking I/O (it’s a stackful coroutine on the reactor, not an escape hatch), Rule 2 was reworded to clarify that the cost is per-request latency rather than shard-wide idleness, and Rule 20 notes that move-only types and C++20 coroutines largely sidestep the do_with hazard. Thanks to the commenters who flagged these.

Ranvier is a project of Minds Aspire, LLC.