Note that with std::execution, c++26 will have a default async runtime (similar to how C# has a default async runtime).
This means that c++26 is getting a default coroutine task type [1] AND a default executor [2]. You can even spawn the tasks like in Tokio/async Rust. [3]
I’m not totally sure if this is a GOOD idea to add to the c++ standard but oh well.
> I’m not totally sure if this is a GOOD idea to add to the c++ standard
What are the downsides? Naively, it seems like a good idea to both provide a coroutine spec (for power users) and a default task type & default executor.
well, Rust didn't do the same thing for a reason. Rust lets you pick and choose what async runtime to use (even though everyone has decided to use Tokio anyways). This is good because it allows for alternative async runtimes like Embassy (https://embassy.dev/) and it also doesn't freeze the API into something that can't change. It could totally be possible that people find a new style of async that works better than std::execution.
I don't know how it works for C++ but you're not locked down to a single implementation with how C# does it. You can have it use different executors/schedulers, different task types, etc.
I recently decided that it was time to properly learn C++ coroutines. I looked at a few tutorials, but by far the best was Raymond Chen coroutine series[1]. It is a long series, but every article is just the right size. Strongly recommended.
Random switching between "Awaitor" and "awaiter" makes it seem like these are distinct concepts instead that the reader is supposed to understand.
In general this moves way too fast for the density of the grammar it's trying to introduce, lines like:
> We have seen Awaitors already - suspend_always is an empty awaiter type that has await_ready returns false always.
But we haven't "seen" suspend_always, it's mentioned in half a sentence in an earlier paragraph, with no further context or examples.
There's a reason Lewis Baker's writings about C++ coroutines are 5000 word monsters, the body of grammar which needs to be covered demands that level of careful and precise definition and exploration.
A stackful coroutine is "write the live registers to your stack, swap the stack pointer to a suspended coroutine, load the old live registers from your new stack". It's a short and boring sequence of assembly.
A C++ coroutine is a CFG transform with a bunch of logic around heap allocation elision to construct something less capable than the above, with a bunch of keywords and semantics that you can kind of derive from the work the compiler needs to do to wire things together.
Stackful coroutines would definitely benefit from being builtin in the language as you can get a significantly better ABI that you can do with a pure library based solution.
You can sorta-kinda make it work with GCC extended inline assembly[1] but it is quite fragile as you need to handle exceptions, unwind info, red zones, etc.
Also you need compiler support to correctly handle thread_local.
attribute((naked)) on a function which has a single asm block as the implementation gives you control over argument passing and changing the stack pointer.
attribute((preserve_none)) on the same function spills most live registers to the stack in the caller. The coroutine switch doesn't need to do as many push/pop which makes it a bit more readable, but mainly this means you don't spill dead registers. That's the big thing you need compiler support for.
I believe the x64 redzone is a non-issue here as you've called the switch function, as opposed to tried to call from within inline asm (which does need to be careful about that). The magic globals are a problem though (floating point control thing, maybe signal mask, errno et al) so I guess don't use the magic globals from within fibres.
"thread_local" doesn't map very sensibly onto fibres. There have been compiler bugs in that area too. Storing some information at the start of the fibre stack works fine though, you just don't get syntactic support for allocating / dereferencing from it.
yes, preserve_none would be exactly what I want, except that I also want to avoid the call instruction in the final asm stream: as the call would not be paired with a ret, the call stack predictor will always mispredict it on every context switch, while an an indirect jmp has a much better chance to be predicted when two coroutines call each other in a tight loop (consider generators for example).
Ideally I think that a ctx_t* __builtin_context_switch(ctx_t* to) would need to be provided by the compiler.
Re thread_local, I believe at least MSVC has (had?) a fiber-safe flag that would handle thread_locals correctly by not caching addresses across function calls.
Amen. Even with those 5k word monsters it's brutally hard. Andreas Fertig's cpp-insights is really helpful, when is able to complete the coroutine transform.
FWIW, I think a useful addition would be for compilers to output the intermediate source code, so you can reason more easily about behaviour and debug into readable code.
I’m excited to actually getting around to trying coroutines - they should be a good replacement for simple state machines. Rather than an storing an object with a state enum, I can write simple declarative code.
In my latest personal project I have switched my asio networking code from callback functions to coroutines. It is such a big improvement! Repeated actions can be written as simple loops, error handling is done with exceptions and the code is generally much easier to follow. And here's the icing on the cake: most data can actually stay in local variables, which means I don't have to care about the lifetime!
Yeah, I have in mind handling network messages, mainly. Thinking about it, I have this problem at two layers:
- At the transport layer, I read in a header on a message (which may come in one byte at a time!), get a size for the serialized message, then read N bytes for the message. The simple way to do things is to use a thread per socket, but that results in a lot of wasted memory, depending on how many sockets there are. Instead I use epoll, but now I can't make the simple for loop reading in bytes for the message - I have to have a buffer + allocated size + current size + state enum, wrapped in a struct, then run a switch statement every time I get an epoll event for the socket.
- At half a level higher, there might be multiple messages or other negotiations that need to happen before we can start to stream messages to the owner of the connection. Once again - need to either use a thread or a state enum to keep track of where we are.
Even if you want the enum to be able to report state, you can still set it somewhere for debug purposes.
I am not a native speaker and I joke about my typos and grammar mistakes being the evidence that none of my code or post is AI generated.
Sorry about the typos. I just fixed all the ones I can find. Hope it's better now.
Note that with std::execution, c++26 will have a default async runtime (similar to how C# has a default async runtime).
This means that c++26 is getting a default coroutine task type [1] AND a default executor [2]. You can even spawn the tasks like in Tokio/async Rust. [3]
I’m not totally sure if this is a GOOD idea to add to the c++ standard but oh well.
[1] https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p35...
[2] http://wg21.link/P2079R5
[3] https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p31...
> I’m not totally sure if this is a GOOD idea to add to the c++ standard
What are the downsides? Naively, it seems like a good idea to both provide a coroutine spec (for power users) and a default task type & default executor.
well, Rust didn't do the same thing for a reason. Rust lets you pick and choose what async runtime to use (even though everyone has decided to use Tokio anyways). This is good because it allows for alternative async runtimes like Embassy (https://embassy.dev/) and it also doesn't freeze the API into something that can't change. It could totally be possible that people find a new style of async that works better than std::execution.
I don't know how it works for C++ but you're not locked down to a single implementation with how C# does it. You can have it use different executors/schedulers, different task types, etc.
You are also not locked down in C++. There are already a handful of coroutine and async runtime implementations out there.
I recently decided that it was time to properly learn C++ coroutines. I looked at a few tutorials, but by far the best was Raymond Chen coroutine series[1]. It is a long series, but every article is just the right size. Strongly recommended.
[1] https://devblogs.microsoft.com/oldnewthing/20210504-01/?p=10...
Random switching between "Awaitor" and "awaiter" makes it seem like these are distinct concepts instead that the reader is supposed to understand.
In general this moves way too fast for the density of the grammar it's trying to introduce, lines like:
> We have seen Awaitors already - suspend_always is an empty awaiter type that has await_ready returns false always.
But we haven't "seen" suspend_always, it's mentioned in half a sentence in an earlier paragraph, with no further context or examples.
There's a reason Lewis Baker's writings about C++ coroutines are 5000 word monsters, the body of grammar which needs to be covered demands that level of careful and precise definition and exploration.
That's pretty damning too though.
A stackful coroutine is "write the live registers to your stack, swap the stack pointer to a suspended coroutine, load the old live registers from your new stack". It's a short and boring sequence of assembly.
A C++ coroutine is a CFG transform with a bunch of logic around heap allocation elision to construct something less capable than the above, with a bunch of keywords and semantics that you can kind of derive from the work the compiler needs to do to wire things together.
If you want fibers there are ample mechanisms already available to implement them, they don't really benefit from specialized language machinery
Stackful coroutines would definitely benefit from being builtin in the language as you can get a significantly better ABI that you can do with a pure library based solution. You can sorta-kinda make it work with GCC extended inline assembly[1] but it is quite fragile as you need to handle exceptions, unwind info, red zones, etc.
Also you need compiler support to correctly handle thread_local.
[1] https://github.com/gpderetta/delimited/blob/master/delimited...
You can do somewhat better than that with clang.
attribute((naked)) on a function which has a single asm block as the implementation gives you control over argument passing and changing the stack pointer.
attribute((preserve_none)) on the same function spills most live registers to the stack in the caller. The coroutine switch doesn't need to do as many push/pop which makes it a bit more readable, but mainly this means you don't spill dead registers. That's the big thing you need compiler support for.
I believe the x64 redzone is a non-issue here as you've called the switch function, as opposed to tried to call from within inline asm (which does need to be careful about that). The magic globals are a problem though (floating point control thing, maybe signal mask, errno et al) so I guess don't use the magic globals from within fibres.
"thread_local" doesn't map very sensibly onto fibres. There have been compiler bugs in that area too. Storing some information at the start of the fibre stack works fine though, you just don't get syntactic support for allocating / dereferencing from it.
yes, preserve_none would be exactly what I want, except that I also want to avoid the call instruction in the final asm stream: as the call would not be paired with a ret, the call stack predictor will always mispredict it on every context switch, while an an indirect jmp has a much better chance to be predicted when two coroutines call each other in a tight loop (consider generators for example).
Ideally I think that a ctx_t* __builtin_context_switch(ctx_t* to) would need to be provided by the compiler.
Re thread_local, I believe at least MSVC has (had?) a fiber-safe flag that would handle thread_locals correctly by not caching addresses across function calls.
Amen. Even with those 5k word monsters it's brutally hard. Andreas Fertig's cpp-insights is really helpful, when is able to complete the coroutine transform.
FWIW, I think a useful addition would be for compilers to output the intermediate source code, so you can reason more easily about behaviour and debug into readable code.
I’m excited to actually getting around to trying coroutines - they should be a good replacement for simple state machines. Rather than an storing an object with a state enum, I can write simple declarative code.
In my latest personal project I have switched my asio networking code from callback functions to coroutines. It is such a big improvement! Repeated actions can be written as simple loops, error handling is done with exceptions and the code is generally much easier to follow. And here's the icing on the cake: most data can actually stay in local variables, which means I don't have to care about the lifetime!
Yeah, I have in mind handling network messages, mainly. Thinking about it, I have this problem at two layers:
- At the transport layer, I read in a header on a message (which may come in one byte at a time!), get a size for the serialized message, then read N bytes for the message. The simple way to do things is to use a thread per socket, but that results in a lot of wasted memory, depending on how many sockets there are. Instead I use epoll, but now I can't make the simple for loop reading in bytes for the message - I have to have a buffer + allocated size + current size + state enum, wrapped in a struct, then run a switch statement every time I get an epoll event for the socket.
- At half a level higher, there might be multiple messages or other negotiations that need to happen before we can start to stream messages to the owner of the connection. Once again - need to either use a thread or a state enum to keep track of where we are.
Even if you want the enum to be able to report state, you can still set it somewhere for debug purposes.
Interesting article, but you should use a spell checker. Typos are distracting.
I am not a native speaker and I joke about my typos and grammar mistakes being the evidence that none of my code or post is AI generated. Sorry about the typos. I just fixed all the ones I can find. Hope it's better now.
i appreciate that you don't use AI. I like real human stuff