Being a core developer of Python has made me want to understand how the language generally works. I realize there will always be obscure corners where I don't know every intricate detail, but to be able to help with issues and the general design of Python I feel like I should try and understand its core semantics and how things work under the hood.
But until recently I didn't understand how async
/await
worked in Python 3.5. I knew that yield from
in Python 3.3 combined with asyncio
in Python 3.4 had led to this new syntax. But having not done a lot of networking stuff -- which asyncio
is not limited to but does focus on -- had led to me not really paying much attention to all of this async
/await
stuff. I mean I knew that:
yield from iterator
was (essentially) equivalent to:
for x in iterator:
yield x
And I knew that asyncio
was an event loop framework which allowed for asynchronous programming, and I knew what those words (basically) meant on their own. But having never dived into the async
/await
syntax to understand how all of this came together, I felt I didn't understand asynchronous programming in Python which bothered me. So I decided to take the time and try and figure out how the heck all of it worked. And since I have heard from various people that they too didn't understand how this new world of asynchronous programming worked, I decided to write this essay (yes, this post has taken so long in time and is so long in words that my wife has labeled it an essay).
Now because I wanted a properly understanding of how the syntax worked, this essay has some low-level technical detail about how CPython does things. It's totally okay if it's more detail than you want or that you don't fully understand it as I don't explain every nuance of CPython internals in order to keep this from turning into a book (e.g., if you don't know that code objects have flags, let alone what a code object is, it's okay and you don't need to care to get something from this essay). I have tried to provide a more accessible summary at the end of every section so that you can skim the details if they turn out to be more than you want to deal with.
A history lesson about coroutines in Python
According to Wikipedia, "Coroutines are computer program components that generalize subroutines for nonpreemptive multitasking, by allowing multiple entry points for suspending and resuming execution at certain locations". That's a rather technical way of saying, "coroutines are functions whose execution you can pause". And if you are saying to yourself, "that sounds like generators", you would be right.
Back in Python 2.2, generators were first introduced by PEP 255 (they are also called generator iterators since generators implement the iterator protocol). Primarily inspired by the Icon programming language, generators allowed for a way to create an iterator that didn't waste memory when calculating the next value in the iteration in a dead-simple manner (you could always create a class that implemented __iter__()
and __next__()
that didn't store every value of the iterator, but that's a lot more effort). For instance, if you wanted to create your own version of range()
, you could do it in an eager fashion by creating a list of integers:
def eager_range(up_to):
"""Create a list of integers, from 0 to up_to, exclusive."""
sequence = []
index = 0
while index < up_to:
sequence.append(index)
index += 1
return sequence
The problem with this, though, is that if you want a large sequence like the integers from 0 to 1,000,000, you have to create a list long enough to hold 1,000,000 integers. But when generators were added to the language, you could suddenly create an iterator that didn't need to create the whole sequence upfront with zero effort. Instead, all you had to do is have enough memory for one integer at a time.
def lazy_range(up_to):
"""Generator to return the sequence of integers from 0 to up_to, exclusive."""
index = 0
while index < up_to:
yield index
index += 1
Having a function pause what it is doing whenever it hit a yield
expression -- although it was a statement until Python 2.5 -- and then be able to resume later is very useful in terms of using less memory, allowing for the idea of infinite sequences, etc.
But as you may have noticed, generators are all about iterators. Now having a better way to create iterators is obviously great (and this is shown when you define an __iter__()
method on an object as a generator), but people knew that if we took the "pausing" part of generators and added in a "send stuff back in" aspect to them, Python would suddenly have the concept of coroutines in Python (but until I say otherwise, consider this all just a concept in Python; concrete coroutines in Python are discussed later on). And that exact feature of sending stuff into a paused generator was added in Python 2.5 thanks to PEP 342. Among other things, PEP 342 introduced the send()
method on generators. This allowed one to not only pause generators, but to send a value back into a generator where it paused. Taking our range()
example further, you could make it so the sequence jumped forward or backward by some amount:
def jumping_range(up_to):
"""Generator for the sequence of integers from 0 to up_to, exclusive.
Sending a value into the generator will shift the sequence by that amount.
"""
index = 0
while index < up_to:
jump = yield index
if jump is None:
jump = 1
index += jump
if __name__ == '__main__':
iterator = jumping_range(5)
print(next(iterator)) # 0
print(iterator.send(2)) # 2
print(next(iterator)) # 3
print(iterator.send(-1)) # 2
for x in iterator:
print(x) # 3, 4
Generators were not mucked with again until Python 3.3 when PEP 380 added yield from
. Strictly speaking, the feature empowers you to refactor generators in a clean way by making it easy to yield every value from an iterator (which a generator conveniently happens to be).
def lazy_range(up_to):
"""Generator to return the sequence of integers from 0 to up_to, exclusive."""
index = 0
def gratuitous_refactor():
nonlocal index
while index < up_to:
yield index
index += 1
yield from gratuitous_refactor()
By virtue of making refactoring easier, yield from
also lets you chain generators together so that values bubble up and down the call stack without code having to do anything special.
def bottom():
# Returning the yield lets the value that goes up the call stack to come right back
# down.
return (yield 42)
def middle():
return (yield from bottom())
def top():
return (yield from middle())
# Get the generator.
gen = top()
value = next(gen)
print(value) # Prints '42'.
try:
value = gen.send(value * 2)
except StopIteration as exc:
value = exc.value
print(value) # Prints '84'.
Summary
Generators in Python 2.2 let the execution of code be paused. Once the ability to send values back into the paused generators were introduced in Python 2.5, the concept of coroutines in Python became possible. And the addition of yield from
in Python 3.3 made it easier to refactor generators as well as chain them together.
What is an event loop?
It's important to understand what an event loop is and how they make asynchronous programming possible if you're going to care about async
/await
. If you have done GUI programming before -- including web front-end work -- then you have worked with an event loop. But since having the concept of asynchronous programming as a language construct is new in Python, it's okay if you don't happen to know what an event loop is.
Going back to Wikipedia, an event loop "is a programming construct that waits for and dispatches events or messages in a program". Basically an event loop lets you go, "when A happens, do B". Probably the easiest example to explain this is that of the JavaScript event loop that's in every browser. Whenever you click something ("when A happens"), the click is given to the JavaScript event loop which checks if any onclick
callback was registered to handle that click ("do B"). If any callbacks were registered then the callback is called with the details of the click. The event loop is considered a loop because it is constantly collecting events and loops over them to find what to do with the event.
In Python's case, asyncio
was added to the standard library to provide an event loop. There's a focus on networking in asyncio
which in the case of the event loop is to make the "when A happens" to be when I/O from a socket is ready for reading and/or writing (via the selectors
module). Other than GUIs and I/O, event loops are also often used for executing code in another thread or subprocess and have the event loop act as the scheduler (i.e., cooperative multitasking). If you happen to understand Python's GIL, event loops are useful in cases where releasing the GIL is possible and useful.
Summary
Event loops provide a loop which lets you say, "when A happens then do B". Basically an event loop watches out for when something occurs, and when something that the event loop cares about happens it then calls any code that cares about what happened. Python gained an event loop in the standard library in the form of asyncio
in Python 3.4.
How async
and await
work
The way it was in Python 3.4
Between the generators found in Python 3.3 and an event loop in the form of asyncio
, Python 3.4 had enough to support asynchronous programming in the form of concurrent programming. Asynchronous programming is basically programming where execution order is not known ahead of time (hence asynchronous instead of synchronous). Concurrent programming is writing code to execute independently of other parts, even if it all executes in a single thread (concurrency is not parallelism). For example, the following is Python 3.4 code to count down every second in two asynchronous, concurrent function calls.
import asyncio
# Borrowed from http://curio.readthedocs.org/en/latest/tutorial.html.
@asyncio.coroutine
def countdown(number, n):
while n > 0:
print('T-minus', n, '({})'.format(number))
yield from asyncio.sleep(1)
n -= 1
loop = asyncio.get_event_loop()
tasks = [
asyncio.ensure_future(countdown("A", 2)),
asyncio.ensure_future(countdown("B", 3))]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
In Python 3.4, the asyncio.coroutine
decorator was used to label a function as acting as a coroutine that was meant for use with asyncio
and its event loop. This gave Python its first concrete definition of a coroutine: an object who implemented the methods added to generators in PEP 342 and represented by the collections.abc.Coroutine
abstract base class. This meant that suddenly all generators implemented the coroutine interface even if they weren't meant to be used in that fashion. To fix this, asyncio
required that all generators meant to be used as a coroutine had to be decorated with asyncio.coroutine
.
With this concrete definition of a coroutine (which matched an API that generators provided), you then used yield from
on any asyncio.Future
object to pass it down to the event loop, pausing execution of the coroutine while you waited for something to happen (being a future object is an implementation detail of asyncio
and not important). Once the future object reached the event loop it was monitored there until the future object was done doing whatever it needed to do. Once the future was done doing its thing, the event loop noticed and the coroutine that was paused waiting for the future's result started again with its result sent back into the coroutine using its send()
method.
Take our example above. The event loop starts each of the countdown()
coroutine calls, executing until it hits yield from
and the asyncio.sleep()
function in one of them. That returns an asyncio.Future
object which gets passed down to the event loop and pauses execution of the coroutine. There the event loop watches the future object until the one second is over (as well as checking on other stuff it's watching, like the other coroutine). Once the one second is up, the event loop takes the paused countdown()
coroutine that gave the event loop the future object, sends the result of the future object back into the coroutine that gave it the future object in the first place, and the coroutine starts running again. This keeps going until all of the countdown()
coroutines are finished running and the event loop has nothing to watch. I'll actually show you a complete example of how exactly all of this coroutine/event loop stuff works later, but first I want to explain how async
and await
work.
Going from yield from
to await
in Python 3.5
In Python 3.4, a function that was flagged as a coroutine for the purposes of asynchronous programming looked like:
# This also works in Python 3.5.
@asyncio.coroutine
def py34_coro():
yield from stuff()
In Python 3.5, the types.coroutine
decorator has been added to also flag a generator as a coroutine like asyncio.coroutine
does. You can also use async def
to syntactically define a function as being a coroutine, although it cannot contain any form of yield
expression; only return
and await
are allowed for returning a value from the coroutine.
async def py35_coro():
await stuff()
A key thing async
and types.coroutine
do, though, is tighten the definition of what a coroutine is. It takes coroutines from simply being an interface to an actual type, making the distinction between any generator and a generator that is meant to be a coroutine much more stringent (and the inspect.iscoroutine()
function is even stricter by saying async
has to be used).
You will also notice that beyond just async
, the Python 3.5 example introduces await
expressions (which are only valid within an async def
). While await
operates much like yield from
, the objects that are acceptable to an await
expression are different. Coroutines are definitely allowed in an await
expression since the concept of coroutines are fundamental in all of this. But when you call await
on an object , it technically needs to be an awaitable object: an object that defines an __await__()
method which returns an iterator which is not a coroutine itself . Coroutines themselves are also considered awaitable objects (hence why collections.abc.Coroutine
inherits from collections.abc.Awaitable
). This definition follows a Python tradition of making most syntax constructs translate into a method call underneath the hood, much like a + b
is a.__add__(b)
or b.__radd__(a)
underneath it all.
How does the difference between yield from
and await
play out at a low level (i.e., a generator with types.coroutine
vs. one with async def
)? Let's look at the bytecode of the two examples above in Python 3.5 to get at the nitty-gritty details. The bytecode for py34_coro()
is:
>>> dis.dis(py34_coro)
2 0 LOAD_GLOBAL 0 (stuff)
3 CALL_FUNCTION 0 (0 positional, 0 keyword pair)
6 GET_YIELD_FROM_ITER
7 LOAD_CONST 0 (None)
10 YIELD_FROM
11 POP_TOP
12 LOAD_CONST 0 (None)
15 RETURN_VALUE
The bytecode for py35_coro()
is :
>>> dis.dis(py35_coro)
1 0 LOAD_GLOBAL 0 (stuff)
3 CALL_FUNCTION 0 (0 positional, 0 keyword pair)
6 GET_AWAITABLE
7 LOAD_CONST 0 (None)
10 YIELD_FROM
11 POP_TOP
12 LOAD_CONST 0 (None)
15 RETURN_VALUE
Ignoring the difference in line number due to py34_coro()
having the asyncio.coroutine
decorator, the only visible difference between them is the GET_YIELD_FROM_ITER
opcode versus the GET_AWAITABLE
opcode. Both functions are properly flagged as being coroutines, so there's no difference there. In the case of GET_YIELD_FROM_ITER
, it simply checks if its argument is a generator or coroutine, otherwise it calls iter()
on its argument (the acceptance of a coroutine object by the opcode for yield from
is only allowed when the opcode is used from within a coroutine itself, which is true in this case thanks to the types.coroutine
decorator flagging the generator as such at the C level with the CO_ITERABLE_COROUTINE
flag on the code object).
But GET_AWAITABLE
does something different. While the bytecode will accept a coroutine just like GET_YIELD_FROM_ITER
, it will not accept a generator if has not been flagged as a coroutine. Beyond just coroutines, though, the bytecode will accepted an awaitable object as discussed earlier. This makes yield from
expressions and await
expressions both accept coroutines while differing on whether they accept plain generators or awaitable objects, respectively.
You may be wondering why the difference between what an async
-based coroutine and a generator-based coroutine will accept in their respective pausing expressions? The key reason for this is to make sure you don't mess up and accidentally mix and match objects that just happen to have the same API to the best of Python's abilities. Since generators inherently implement the API for coroutines then it would be easy to accidentally use a generator when you actually expected to be using a coroutine. And since not all generators are written to be used in a coroutine-based control flow, you need to avoid accidentally using a generator incorrectly. But since Python is not statically compiled, the best the language can offer is runtime checks when using a generator-defined coroutine. This means that when types.coroutine
is used, Python's compiler can't tell if a generator is going to be used as a coroutine or just a plain generator (remember, just because the syntax says types.coroutine
that doesn't mean someone hasn't earlier done types = spam
earlier), and thus different opcodes that have different restrictions are emitted by the compiler based on the knowledge it has at the time.
One very key point I want to make about the difference between a generator-based coroutine and an async
one is that only generator-based coroutines can actually pause execution and force something to be sent down to the event loop. You typically don't see this very important detail because you usually call event loop-specific functions like the asyncio.sleep()
function since event loops implement their own APIs and these are the kind of functions that have to worry about this little detail. For the vast majority of us, we will work with event loops rather than be writing them and thus only be writing async
coroutines and never need to really care about this. But if you're like me and were wondering why you couldn't write something like asyncio.sleep()
using only async
coroutines, this can be quite the "aha!" moment.
Summary
Let's summarize all of this into simpler terms. Defining a method with async def
makes it a coroutine. The other way to make a coroutine is to flag a generator with types.coroutine
-- technically the flag is the CO_ITERABLE_COROUTINE
flag on a code object -- or a subclass of collections.abc.Coroutine
. You can only make a coroutine call chain pause with a generator-based coroutine.
An awaitable object is either a coroutine or an object that defines __await__()
-- technically collections.abc.Awaitable
-- which returns an iterator that is not a coroutine. An await
expression is basically yield from
but with restrictions of only working with awaitable objects (plain generators will not work with an await
expression). An async
function is a coroutine that either has return
statements -- including the implicit return None
at the end of every function in Python -- and/or await
expressions (yield
expressions are not allowed). The restrictions for async
functions is to make sure you don't accidentally mix and match generator-based coroutines with other generators since the expected use of the two types of generators are rather different.
Think of async
/await
as an API for asynchronous programming
A key thing that I want to point out is actually something I didn't really think deeply about until I watched David Beazley's Python Brasil 2015 keynote. In that talk, David pointed out that async
/await
is really an API for asynchronous programming (which he reiterated to me on Twitter). What David means by this is that people shouldn't think that async
/await
as synonymous with asyncio
, but instead think that asyncio
is a framework that can utilize the async
/await
API for asynchronous programming.
David actually believes this idea of async
/await
being an asynchronous programming API that he has created the curio
project to implement his own event loop. This has helped make it clear to me that async
/await
allows Python to provide the building blocks for asynchronous programming, but without tying you to a specific event loop or other low-level details (unlike other programming languages which integrate the event loop into the language directly). This allows for projects like curio
to not only operate differently at a lower level (e.g., asyncio
uses future objects as the API for talking to its event loop while curio
uses tuples), but to also have different focuses and performance characteristics (e.g., asyncio
has an entire framework for implementing transport and protocol layers which makes it extensible while curio
is simpler and expects the user to worry about that kind of thing but also allows it to run faster).
Based on the (short) history of asynchronous programming in Python, it's understandable that people might think that async
/await
== asyncio
. I mean asyncio
was what helped make asynchronous programming possible in Python 3.4 and was a motivating factor for adding async
/await
in Python 3.5. But the design of async
/await
is purposefully flexible enough to not require asyncio
or contort any critical design decision just for that framework. In other words, async
/await
continues Python's tradition of designing things to be as flexible as possible while still being pragmatic to use (and implement).
An example
At this point your head might be awash with new terms and concepts, making it a little hard to fully grasp how all of this is supposed to work to provide you asynchronous programming. To help make it all much more concrete, here is a complete (if contrived) asynchronous programming example, end-to-end from event loop and associated functions to user code. The example has coroutines which represents individual rocket launch countdowns but that appear to be counting down simultaneously . This is asynchronous programming through concurrency; three separate coroutines will be running independently, and yet it will all be done in a single thread.
import datetime
import heapq
import types
import time
class Task:
"""Represent how long a coroutine should wait before starting again.
Comparison operators are implemented for use by heapq. Two-item
tuples unfortunately don't work because when the datetime.datetime
instances are equal, comparison falls to the coroutine and they don't
implement comparison methods, triggering an exception.
Think of this as being like asyncio.Task/curio.Task.
"""
def __init__(self, wait_until, coro):
self.coro = coro
self.waiting_until = wait_until
def __eq__(self, other):
return self.waiting_until == other.waiting_until
def __lt__(self, other):
return self.waiting_until < other.waiting_until
class SleepingLoop:
"""An event loop focused on delaying execution of coroutines.
Think of this as being like asyncio.BaseEventLoop/curio.Kernel.
"""
def __init__(self, *coros):
self._new = coros
self._waiting = []
def run_until_complete(self):
# Start all the coroutines.
for coro in self._new:
wait_for = coro.send(None)
heapq.heappush(self._waiting, Task(wait_for, coro))
# Keep running until there is no more work to do.
while self._waiting:
now = datetime.datetime.now()
# Get the coroutine with the soonest resumption time.
task = heapq.heappop(self._waiting)
if now < task.waiting_until:
# We're ahead of schedule; wait until it's time to resume.
delta = task.waiting_until - now
time.sleep(delta.total_seconds())
now = datetime.datetime.now()
try:
# It's time to resume the coroutine.
wait_until = task.coro.send(now)
heapq.heappush(self._waiting, Task(wait_until, task.coro))
except StopIteration:
# The coroutine is done.
pass
@types.coroutine
def sleep(seconds):
"""Pause a coroutine for the specified number of seconds.
Think of this as being like asyncio.sleep()/curio.sleep().
"""
now = datetime.datetime.now()
wait_until = now + datetime.timedelta(seconds=seconds)
# Make all coroutines on the call stack pause; the need to use `yield`
# necessitates this be generator-based and not an async-based coroutine.
actual = yield wait_until
# Resume the execution stack, sending back how long we actually waited.
return actual - now
async def countdown(label, length, *, delay=0):
"""Countdown a launch for `length` seconds, waiting `delay` seconds.
This is what a user would typically write.
"""
print(label, 'waiting', delay, 'seconds before starting countdown')
delta = await sleep(delay)
print(label, 'starting after waiting', delta)
while length:
print(label, 'T-minus', length)
waited = await sleep(1)
length -= 1
print(label, 'lift-off!')
def main():
"""Start the event loop, counting down 3 separate launches.
This is what a user would typically write.
"""
loop = SleepingLoop(countdown('A', 5), countdown('B', 3, delay=2),
countdown('C', 4, delay=1))
start = datetime.datetime.now()
loop.run_until_complete()
print('Total elapsed time is', datetime.datetime.now() - start)
if __name__ == '__main__':
main()
As I said, it's contrived, but if you run this in Python 3.5 you will notice that all three coroutines run independently in a single thread and yet the total amount of time taken to run is about 5 seconds. You can consider Task
, SleepingLoop
, and sleep()
as what an event loop provider like asyncio
and curio
would give you. For a normal user, only the code in countdown()
and main()
are of importance. As you can see, there is no magic to async
, await
, or this whole asynchronous programming deal; it's just an API that Python provides you to help make this sort of thing easier.
My hopes and dreams for the future
Now that I understand how this asynchronous programming works in Python, I want to use it all the time! It's such an awesome concept that's so much better than something you would have used threads for previously. The problem is that Python 3.5 is so new that async
/await
is also very new. That means there are not a lot of libraries out there supporting asynchronous programming like this. For instance, to do HTTP requests you either have to construct the HTTP request yourself by hand (yuck), use a project like the aiohttp
framework which adds HTTP on top of another event loop (in this case, asyncio
), or hope more projects like the hyper
library continue to spring up to provide an abstraction for things like HTTP which allow you to use whatever I/O library you want (although unfortunately hyper
only supports HTTP/2 at the moment).
Personally, I hope projects like hyper
take off so that we have a clear separation between getting binary data from I/O and how we interpret that binary data. This kind of abstraction is important because most I/O libraries in Python are rather tightly coupled to how they do I/O and how they handle data coming from I/O. This is a problem with the http
package in Python's standard library as it doesn't have an HTTP parser but a connection object which does all the I/O for you. And if you were hoping requests
would support asynchronous programming, your hopes have already been dashed because the synchronous I/O that requests
uses is baked into its design. This shift in ability to do asynchronous programming gives the Python community a chance to fix a problem it has with not having abstractions at the various layers of the network stack. And we have the perk of it not being hard to make asynchronous code run as if its synchronous, so tools filling the void for asynchronous programming can work in both worlds.
I also hope that Python gains some form of support in async
coroutines for yield
. Maybe this will require yet another keyword (maybe something like anticipate
?), but the fact that you actually can't implement an event loop system with just async
coroutines bothers me. Luckily, it turns out I'm not the only one who thinks this, and since the author of PEP 492 agrees with me, I think there's a chance of getting this quirk removed.
Conclusion
Basically async
and await
are fancy generators that we call coroutines and there is some extra support for things called awaitable objects and turning plain generators in to coroutines. All of this comes together to support concurrency so that we have better support for asynchronous programming in Python. It's awesome and much easier to use than comparable approaches like threads -- I wrote an end-to-end example of asynchronous programming in under 100 lines of commented Python code -- while still being quite flexible and fast (the curio FAQ says that it runs faster than twisted
by 30-40% but slower than gevent
by 10-15%, and all while being implemented in pure Python; remember that Python 2 + Twisted can use less memory and is easier to debug than Go, so just imagine what you could do with this!). I'm very happy that this landed in Python 3 and I look forward to the community embracing it and helping to flesh out its support in libraries and frameworks so we can all benefit from asynchronous programming in Python.