Have you ever wondered: how can async functions stop at an await
point, and then we are able to continue from that await point?
well, me too!
This is interesting, because it sounds like this is impossible to achieve with synchronous code. The async/await
keywords will compile down to some synchronous
instructions in the end...
If we want to stop a function, the only way to get out is: to return
from it. But when we return
, we lose all the stack/info about the function.
So, how do we stop a function at some point, and continue from where we left off? This is what async
functions are all about, right?
This repository is for revealing the magic of the async
functions.
It tries to uncover every single secret of the async
concept.
If you are thinking that: There are already some async libraries out there, that is working pretty well. What is the purpose of this one?
The answer is: this library is not for re-implementing the async world. It is to show what is going on
behind the scenes in the simplest way possible. This project won't be a 3rd party library. It is a working
demo of the async world. Since the goal is to demystify all the concepts
related to the async world, we won't be using any 3rd party library, and also nothing magical from the
standard library. There won't be any tokio
, mio
, future
, async
...
We are only allowed to use basic synchronous code in this project.
Click to expand!
The below example is a good demonstration of we were able to run interruptible functions in an asynchronous
context, and did it from scratch (without using any async
, yield
, future
, or any other magical thing)!
You can supply your own function (which will be turned into it's async version via freezable-macro
),
and use it in the main example if you want!
Example output:
--------------
STATE OF THE TASK #0: frozen in state: 1, with value: 10
STATE OF THE TASK #1: frozen in state: 1, with value: 20
STATE OF THE TASK #2: frozen in state: 1, with value: 30
---------
for the task #0, requesting the I/O resource: 8
for the task #1, requesting the I/O resource: 41
for the task #2, requesting the I/O resource: 70
---------
The I/O resource: 41, is now ready!
Calling unfreeze on task #1
STATE OF THE TASK #1: frozen in state: 2, with value: 21
---------
for the task #1, requesting the I/O resource: 74
---------
The I/O resource: 8, is now ready!
Calling unfreeze on task #0
STATE OF THE TASK #0: frozen in state: 2, with value: 11
---------
for the task #0, requesting the I/O resource: 26
---------
The I/O resource: 26, is now ready!
Calling unfreeze on task #0
STATE OF THE TASK #0: frozen in state: 3, with value: 12
---------
for the task #0, requesting the I/O resource: 124
---------
The I/O resource: 74, is now ready!
Calling unfreeze on task #1
STATE OF THE TASK #1: frozen in state: 3, with value: 22
---------
for the task #1, requesting the I/O resource: 74
---------
The I/O resource: 70, is now ready!
Calling unfreeze on task #2
STATE OF THE TASK #2: frozen in state: 2, with value: 31
---------
for the task #2, requesting the I/O resource: 101
---------
The I/O resource: 124, is now ready!
Calling unfreeze on task #0
STATE OF THE TASK #0: Finished!
---------
---------
The I/O resource: 74, is now ready!
Calling unfreeze on task #1
STATE OF THE TASK #1: Finished!
---------
---------
The I/O resource: 101, is now ready!
Calling unfreeze on task #2
STATE OF THE TASK #2: frozen in state: 3, with value: 32
---------
for the task #2, requesting the I/O resource: 144
---------
The I/O resource: 144, is now ready!
Calling unfreeze on task #2
STATE OF THE TASK #2: Finished!
---------
---------
all tasks are finished!
in the above example:
being awaited I/O resource queue: [8, 41, 70, 74, 26, 124, 74, 101, 144]
became ready I/O resource queue: [41, 8, 26, 74, 70, 124, 74, 101, 144]
I prepared a short summary of what is happening when you use the async
keyword, and how your function is
actually put to sleep and is awaken again. Considering that I've watched 15+ hours of video, and read countless pages
of articles about it, here is a very short summary (believe me, it is the shortest you can
find that will answer your questions about async) of what is actually happening, and what should we do to replicate it:
Click to expand!
The Main problem: we have some I/O related tasks in our hand (network, file read, etc.). And we don’t want our code to sit idly while waiting for this I/O task to finish. It should be able to continue doing some other work while waiting for the result of the I/O task at hand.
There are 3 approaches for that (Carl Fredrik Samson explained it better than I can, so quoting from his book):
Now one way of accomplishing this is letting the OS take care of everything for us. We do this by simply spawning a new OS thread for each task we want to accomplish and write code like we normally would.
Pros:
- Simple
- Easy to code
- Reasonably performant
- You get parallelism for free
Cons:
- OS level threads come with a rather large stack. If you have many tasks waiting simultaneously (like you would in a web-server under heavy load) you'll run out of memory pretty soon.
- There are a lot of syscalls involved. This can be pretty costly when the number of tasks is high.
- The OS has many things it needs to handle. It might not switch back to your thread as fast as you'd wish
- The OS doesn't know which tasks to prioritize, and you might want to give some tasks a higher priority than others.
Another common way of handling this is green threads. Languages like Go uses this to great success. In many ways this is similar to what the OS does but the runtime can be better adjusted and suited to your specific needs.
Pros:
- Simple to use for the user. The code will look like it does when using OS threads
- Reasonably performant
- Abundant memory usage is less of a problem
- You are in full control over how threads are scheduled and if you want you can prioritize them differently.
Cons:
- You need a runtime, and by having that you are duplicating part of the work the OS already does. The runtime will have a cost which in some cases can be substantial.
- Can be difficult to implement in a flexible way to handle a wide variety of tasks
The third way we're covering today is the one that most closely matches an ideal solution. In this solution we register an interest in an event, and then let the OS tell us when it's ready.
The way this works is that we tell the OS that we're interested in knowing when data is arriving for us on the network card. The network card issues an interrupt when something has happened, in which case the driver lets the OS know that the data is ready.
Now, we still need a way to "suspend" many tasks while waiting, and this is where Node's "runtime" or Rust's Futures come in to play.
Pros:
- Close to optimal resource utilization
- It's very efficient
- Gives us the maximum amount of flexibility to decide how to handle the events that occurs
Cons:
- Different operating systems have different ways of handling these kind of queues. Some of them are difficult to reconcile with each other. Some operating systems have limitations on what I/O operations support this method.
- Great flexibility comes with a good deal of complexity
- Difficult to write an ergonomic API with an abstraction layer that accounts for the differences between the operating systems without introducing unwanted costs.
- Only solves part of the problem—the programmer still needs a strategy for suspending tasks that are waiting.
Rust's async story is modeled around option 3, and one of the reasons it has taken a long time is related to the cons of this method and choosing a way to model how tasks should be suspended. Rust's Futures model a task as a State Machine where a suspension point represents a
state.
And here is a good comparison from the paper:
Coming back to our case:
- We need functions that are stoppable/freezable. Which can be implemented as state machines.
- We are not using OS threads, or green-threads in this example. We are going with the 3rd option listed above.
- In other words: we should turn our
async
functions, and divide it into states for each waiting/blocking point.freezable
is the most simple and through example of this imho. But here is another great example: https://cfsamson.github.io/books-futures-explained/4_generators_async_await.htm - to achieve this, we can’t expect our users to write their own states for each function.
Because we don’t write state machines for
async
programming. In reality, ourasync
code indeed turns into state machines behind the scenes. Who does that? The compiler! I won’t write a compiler for this project, but I wrote a macro which will turn your weakling function into a giga-chad state machine. The code for the macro itself is ugly due to the innate nature of macro concept, but you don’t have to understand it. Act like it is the compiler that doing the work for you. If you are curious, here is the link to the macro - Bonus point: using this
freezable
concept, we can actually implement our owngenerators
anditerators
. We don’t need the fancy and magicalyield
keyword. In fact, this is how theyield
keyword is implemented ;)
- these freezable functions will put themselves to sleep (
freeze
themselves) at some specific points (network requests, read/writes, inputs from user, etc.). We need a mechanism to awake (unfreeze
) them. What are these specific points? These are I/O waits. Imagine an async function, that calls another async function, that calls aleaf-future
. Now imagine the compiled code, all these functions will be expanded and became a single code piece. When shall this single code piece stop/freeze? When the requested I/O source is not ready. So, in the end, we will only stop forleaf-futures
, or I/O resources (ok, also timers).- If we tell the OS that we are interested in specific events, it can notify us https://cfsamson.github.io/book-exploring-async-basics/4_interrupts_firmware_io.html
- Here is a a very brief summary of what is going on: your I/O request is relayed to the related hardware component (for example, if you requested to fetch the contents of a website, the request is relayed to your network card).
- So, calling a
fetch_website
function, which calls a more low-level function, which calls an even more low-level function, and inevitably aleaf-future
that is related to sockets and network. - The network card has a micro-controller in it, so it probably does some polling to check if there is any answer present from the server. When there is, it notifies the OS. And then OS interrupts the CPU with a message: “this resource is now ready”.
- As you probably guessed already, this OS part is another rabbit hole. If we want to implement our own functions that can communicate with OS, we will have to dive into specific signal/flags that each OS use.
- On top of that, each operating system has a different strategy for this notification system (for example: Epoll for linux, Kqueue for BSD(macOS), and IOCP for Windows). Each will require a different implementation on our side.
- This is all too low-level, and implementing this would be a whole another project
(
mio
). I don’t think doing that will help demystify theasync
concept, and we will be diverging from our main point. If you insist on learning the details of OS communication, read the above link and all the OS related chapters in there, and then you might dive intomio
project.
- coming back to our topic: we need a mechanism for telling the OS that we will be interested in some events
- and another one for listening the signals that OS sending us
- since we won’t be using any 3rd party libraries in this project, and also don’t write our
own
mio
. So what should we do? Let’s recap: we should tell the OS that we are interested in a specific event, and that event will happen at some time in the future. And the OS will later notify us about this. We won’t have actual I/O resources in our case. We just want to show that we can stop a function, and with a signal from the OS, we can continue on this function. So we can do this instead:- spawn a thread, that will be the simulation of the OS
- send the subscribed I/O resource ID to the (in place of some I/O resource subscription) to this thread (the OS)
- and the OS will just wait for some random time, and then notify us that the requested resource is ready.
- If we tell the OS that we are interested in specific events, it can notify us https://cfsamson.github.io/book-exploring-async-basics/4_interrupts_firmware_io.html
- We covered the communication part with the OS. But how will the rest work? Who will awake (
unfreeze
) the tasks? One way to implement this is:-
have a thread (
executor
) that will run the these async tasks. Btw, this does not have to be an extra thread, it can be the main thread as well -
and have another thread that will listen to the signals sent by the OS, and somehow notify the executor about this, so that executor will know which task is available for further progress
-
this is a great short summary I believe (quoting):
In Rust a library like mio will be what drives the Reactor part. In Rust we have
Futures
which pass on aWaker
to the Reactor. Instead of communicating directly with the Executor through a channel, the Reactor wil callWaker.wake()
on the relevantWaker
once an event is ready. -
quoting from What does the “wake” phase require? part of this article: https://boats.gitlab.io/blog/post/wakers-i/
The final phase is the phase in which the wakers really do all of their work. Once the event we’re waiting on has happened, the event source calls the wake method. The wake method is implemented by each executor, and contains the logic for setting up this future to be polled again by the executor. It turns out there are several ways to implement this, and we’d like to be able to support all of them.
- Using an &‘static AtomicBool: In this implementation, the executor can only run one task at a time. When it is time to wake that task, a global flag is tripped, and then the task will be polled again via a side-channel. This implementation does not make sense for most use cases, but it is actually being used by some users on embedded platforms with extremely minimal resources. The waker is represented as a reference to the global flag.
- Using Task IDs: In this implementation, the executor stores a global set of tasks that
it is current polling in some sort of associative map. When it is time to wake a task,
the executor is given the ID for that task in order to tell it which task is ready to be polled.
The waker is represented as this task ID (in effect, the waker’s data is a
usize
). - Using reference counting: In this implementation (which has become the predominant implementation), the executor is just one or more queue of tasks that it will fetch from and poll as soon as they’re ready. The waker is itself a reference counted pointer to a particular task, and when it is time to wake, it puts itself onto the executor’s queue. In this case, the waker is represented as a reference counted pointer.
-
-
since this is all too complicated, we will use a simpler approach:
- our
Executor
will send the names/ID's of the I/O resources we are waiting on to ourReactor
- our
Reactor
will listen to the OS notifications for these I/O resources - if any notification arrives, our
Reactor
will then notify theExecutor
about these events. - remember that, this is an oversimplification!
- our
-
We will not use any 3rd party library, and implement every magical thing from scratch, including:
- the types/traits:
Futures
,Generators
,Iterators
,
- the keywords:
async
,yield
await
.
- the concepts:
Executor
Reactor
Waker
- Implement
Freezable
:- The most important ability of the
async
world is: being able to stop a function, and continue from where we left off. Async functions are able to do this, viaGenerators
. AndGenerators
are able to do this via theyield
keyword. So we should implement a library, that will demonstrate how we canfreeze
a function, and continue to it when we feel like it. - We will see that, we can write a
freezable
function. But in order to make another person's function turn into afreezable
function, we will have to write a code, that should write a code for us. So, the next step is: write a macro, that will turn the given function into afreezable
function.
- The most important ability of the
- Implement a
Reactor
and anExecutor
- Now, our
freezable
functions are ready to use, but in the async world, we are not continuing our functions when we please to. They are being frozen when the IO resource is not ready, and getting awaken when that resource is ready. - simulate the OS with an external thread, since dealing with OS's and IO's are whole another project
on its own,
mio
(details are discussed below). - implement a
Reactor
that will listen to the signals of the OS, and can relay the information to theExecutor
about which task should be polled next. - implement an
Executor
which will try to callunfreeze()
on the the tasks that are told by theReactor
, and continue on manyfreezable
tasks in a single thread.
- Now, our