Hacking Coroutines into C

112 points by jmillikin 13 hours ago

adinisom 9 hours ago

My favorite trick in C is a light-weight Protothreads implemented in-place without dependencies. Looks something like this for a hypothetical blinky coroutine:

  typedef struct blinky_state {
    size_t pc;
    uint64_t timer;
    ... variables that need to live across YIELDs ...
  } blinky_state_t;
  
  blinky_state_t blinky_state;
  
  #define YIELD() s->pc = __LINE__; return; case __LINE__:;
  void blinky(void) {
    blinky_state_t *s = &blinky_state;
    uint64_t now = get_ticks();
    
    switch(s->pc) {
      while(true) {
        turn_on_LED();
        s->timer = now;
        while( now - s->timer < 1000 ) { YIELD(); }
        
        turn_off_LED();
        s->timer = now;
        while( now - s->timer < 1000 ) { YIELD(); }
      }
    }
  }
  #undef YIELD

Can, of course, abstract the delay code into it's own coroutine.

Your company is probably using hardware containing code I've written like this.

What's especially nice that I miss in other languages with async/await is ability to mix declarative and procedural code. Code you write before the switch(s->pc) statement gets run on every call to the function. Can put code you want to be declarative, like updating "now" in the code above, or if I have streaming code it's a great place to copy data.

dkjaudyeqooe 7 hours ago

A cleaner, faster way to implement this sort of thing is to use the "labels as values" extension if using GCC or Clang []. It avoids the switch statement and associated comparisons. Particularly useful if you're yielding inside nested loops (which IMHO is one of the most useful applications of coroutines) or switch statements.
[] https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
fjfaase 8 hours ago

I have used this approach, with an almost similar looking define for YIELD myself.
If there is just one instance of a co-routine, which is often the case for embedded software, one could also make use of static variables inside the function. This also makes the code slightly faster.
You need some logic, if for example two co-routines need to access a shared peripheral, such as I2C. Than you might also need to implement a queue. Last year, I worked a bit on a tiny cooperative polling OS, including a transpiler. I did not finish the project, because it was considered too advanced for the project I wanted to use it for. Instead old fashion state machines documented with flow-charts were required. Because everyone can read those, is the argument. I feel that the implementation of state machines is error prone, because it is basically implementing goto statements where the state is like the label. Nasty bugs are easily introduced if you forget a break statement at the right place is my experience.
- adinisom 8 hours ago
  
  Yes, 100%. State transitions are "goto" by another name. State machines have their place but tend to be write-only (hard to read and modify) so are ideally small and few. Worked at a place that drank Miro Samek's "Practical Statecharts in C/C++" kool-aid... caused lots of problems. So instead I use this pattern everywhere that I can linearize control flow. And if I need a state machine with this pattern I can just use goto.
  Agreed re: making the state a static variable inside the function. Great for simple coroutines. I made it a pointer in the example for two reasons:
  - Demonstrates access to the state variables with very little visual noise... "s->"
  - For sub-coroutines that can be called from multiple places such as "delay" you make the state variable the first argument. The caller's state contains the sub-coroutine's state and the caller passes it to the sub-coroutine. The top level coroutine's state ends up becoming "the stack" allocated at compile-time.
  - kazinator 5 hours ago
    
    Worked at a place that drank hiearchical state machines kool-aid. Yeah.
    https://en.wikipedia.org/wiki/UML_state_machine#Hierarchical...
syncurrent 5 hours ago
In `proto_activities` this blinking would look like this:
```
  pa_activity (Blinker, pa_ctx_tm(), uint32_t onMs, uint32_t offMs) {
    pa_repeat {
      turn_on_LED();
      pa_delay_ms (onMs);
  
      turn_off_LED();
      pa_delay_ms (offMs);
    }
  } pa_end
```
Here the activity definition automatically creates the structure to hold the pc, timer and other variables which would outlast a single tick.
csmantle 9 hours ago

Yeah. Protothreads (with PT_TIMER extensions) is one of the classics libraries, and also was used in my own early embedded days. I was totally fascinated by its turning ergonomic function-like macros into state machines back then.

codr7 12 minutes ago

Looks overly complicated to me.

This is an alternative I wrote for my C book:

https://github.com/codr7/hacktical-c/tree/main/task

astrobe_ 6 hours ago

[State machines] lacked a linear flow

That's because you need a state machine when your control flow is not linear. They are represented by graphs, remember? This is actually a case where using gotos might be clearer. Although not drastically better because the main problem is that written source code is linear by nature. A graph described by a dedicated DSL such as GraphViz has the same problem, although at least you can visualize the result.

But control flow is only one term of the equation, the other being concurrency. One typically has more than one state machine running; sometimes one use state machines that are actually essentially linear because of that. Cooperative multitasking. I would question trying to solve these two problems, non-linearity and concurrency. Sometimes when you try too hard to kill two birds with one stone you end up with one dead bird and a broken window.

One lecturer of the conference announced earlier [1] made that point too that visualization helps a lot, and that reminded me of Pharo's inspection tools [2]. Seeing what's going on under the hood is more important that one usually thinks.

One issue with state machines is that they are hardly modular: adding a state or decomposing a state into multiple states is more work than one would like it to be. It is the inverse problem of visualization: what you draw is what you code. A good tool for that would let the user connect nodes with arrows and assign code to nodes and/or arrows; it would translate this into some textual intermediate language to play nice with Git, and a compiler would transform it to C code for integration in the build system.

[1] https://bettersoftwareconference.com/ [2] https://pharo.org/features

mikepurvis 12 hours ago

FreeRTOS can also be used with a cooperative scheduler: https://www.freertos.org/Why-FreeRTOS/Features-and-demos/RAM...

That said, if I was stuck rolling this myself, I think I’d prefer to try to do it with “real” codegen than macros. If nothing else it would give the ability to do things like blocks and correctness checks, and you’d get much more readable resulting source when it came to stepping through it with a debugger.

userbinator 10 hours ago

Of course, the project didn’t allow us to use an RTOS.

That tends to just make the project eventually implement an approximation of one... as what appears to have happened here.

How I'd solve the given problem is by using the PWM peripheral (or timer interrupts if no PWM peripheral exists) and pin change interrupts, with the CPU halted nearly 100% of the time. I suspect that approach is even simpler than what's shown here.

user____name 3 hours ago

I've recently read a bunch of articles explaining these weird macro soup setups for emulating coroutines in C. This one is probably the most advanced writeup in implementing fibers/coroutines I came across. The focus is on a multithreaded context, which seems to complicate things a lot. Honestly I feel like you need language level support for them in that case, they seem more trouble than they're worth otherwise, at least in plain C.

https://graphitemaster.github.io/fibers/

syncurrent 7 hours ago

A similar approach, but rooted in the idea of synchronous languages like Esterel or Blech:

https://github.com/frameworklabs/proto_activities

Neywiny 12 hours ago

The intent here is nice. I historically hate state machines for sequential executioners. To me they make sense in FPGA/ASIC/circuits. In software, they just get so complicated. I've even seen state managers managing an abstracted state machine implementing a custom device to do what's ultimately very sequential work.

It's my same argument that there should be no maximum number of lines to a function. Sometimes, you just need to do a lot of work. I comment the code blocks, maybe with steps/parts, but there's no point in making a function that's only called in one place.

But anything is better than one person I met who somehow was programming without knowing how to define their own functions. Gross

duped 9 hours ago

> I comment the code blocks, maybe with steps/parts, but there's no point in making a function that's only called in one place.
I encourage junior developers that get into this habit (getting worse now, with LLMs) to convert the comment into a function name and add the block as a function, thinking pretty carefully about its function signature. If you have a `typedef struct state` that gets passed around, great.
The reason for splitting up this code is so that the person writing it doesn't fuck up, the input/output is expressed as types and validated before they push it. It's easy for me to review, because I can understand small chunks of code better than big chunks, and logically divides up the high level architecture from the actual implementation so I can avoid reviewing the latter if I find trouble with the former. It's also good as a workflow, where you can pair to write out the high level flow and then split off to work on implementation internally. And most importantly, it makes it possible to test the code.
I have had this discussion with many grumbly developers that think of the above as a "skill issue." I don't really want to work with those people, because their code sucks.
- Neywiny 3 hours ago
  
  I think for me, and it sounds like for this author, the context lost by that abstraction makes it harder to review. In my experience it's easier for me to understand a small block of code, but it's harder to understand how it impacts the system when it's out of context.
  For example:
  x++;
  A very easy piece of code to understand. But who wants x, and what values could they expect? Why do we ++ and under what conditions?
  Those effects, again just for me your mileage may vary, tend to get much harder to understand.
throwaway81523 10 hours ago

> there's no point in making a function that's only called in one place.
There's nothing wrong with doing that if it helps make your code clearer. The compiler's optimizer will inline it when appropriate so there's no runtime overhead either.
- munch117 6 hours ago
  
  Not only that, the compiler's optimizer might actually do a better job if you split up a big function. Because the smaller functions have less register pressure.
TechDebtDevin 11 hours ago

I actually write a lot of Go in state machine like patterns. My state types files would make you think im schizophrenic. I just finished up a project this week that was 10k lines of comments in 18k loc. Noone else has to read it tho, they actually probably appreciate it if they did.

moconnor 6 hours ago

A colleague of mine did this much more elegantly by manually updating the stack and jmping. This was a couple of decades ago and afaik the code is still in use in supercomputing centres today.

joshlk 2 hours ago

Rust can be used in an embedded environment and also offers asynchronous execution built into the language

throwaway81523 12 hours ago

As the article acknowledges at the end, this is sort of like protothreads which has been around for ages. The article's CSS was so awful that I didn't read anything except the last paragraph, which seemed to tell me what I wanted to know.

mananaysiempre 12 hours ago

Right, this is more or less this blog author’s riff on (PuTTY author) Simon Tatham’s old page on coroutines using switch[1], which itself indicates that Tom Duff thought of this (which makes sense, it’s only a half-step away from Duff’s device) and described it as “revolting”. So in this sense the idea predates the standardization of C.
[1] https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html
hecanjog 11 hours ago

> The article's CSS was so awful
Small text sizes? What is the problem for you?
- shakna 11 hours ago
  
  Whilst I wouldn't call it "awful", the spacing between the lines isn't helping any.

Nursie 8 hours ago

Cooperative multithreading via setjmp and longjmp has been around in C since the 80s at least.

I’m not sure this is so much hacking as an accepted technique from the old-old days which has somewhat fallen out of favour, especially as C is falling a little outside of the mainstream these days.

Perhaps it’s almost becoming lost knowledge :)

ajb 7 hours ago

This isn't using setjmp/longjmp
It's using Simon Tatham's method based on Duff's device (https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html)
- Nursie 6 hours ago
  
  Sure, I guess I just wanted to point out that regardless of method, people have been building these sorts of facilities in C for a very long time.
  It doesn’t lessen the achievement of course, but it amuses me an in “everything old is new again” kinda way.
  - johnisgood 3 hours ago
    
    "everything old is new again" is so true. I see it across IT all the time, heh.

Agyemang 4 hours ago

Yes

stefantalpalaru 2 hours ago

[dead]