Skip to content

Tokio

Tokio is the most popular async/await runtime in Rust for building web services. While Tokio itself is fast, in truth it's only as good as the code you put under it.

Tokio has a multi-threaded runtime and a single-threaded runtime. A multi-threaded runtime has a global work pool and a work stealing affordance shared by the worker threads. Each worker thread runs its own subordinate work queue and event loop.

Multi threadSingle thread
image showing a multi threaded runtime above its threadsimage showing a multi threaded runtime above its threadsimage showing a single threaded runtimeimage showing a single threaded runtime

The multi-threaded runtime separates its workers from its global work pool, IO and time drivers. Workers have a local task queue. They periodically check the time and IO drivers, as well as the global task queue.

The single, or "current" thread runtime bundles everything into itself. We'll come back to this another time.

Blocking in Tokio

When you're running on a thread and you need to wait to continue, what's better than the venerable thread::sleep()? It is the grandparent of all blocking functions, the first (and last) resort of the developer chasing a concurrency bug, and it is ubiquitous.

When you are running a cooperative multitasking system like Tokio on Rust, this is dangerous. It is usually not permissible in a web service with limited threads. See How many threads should I use with Tokio? for a rationale for using very few threads with Tokio.

In Rust, a Future's mechanism for moving forward is poll(). It's a plain synchronous function, just like thread::sleep(). You can call thread::sleep() in a Future's poll(), but we ought not do that. Note that calling thread::sleep() inside of poll() is exactly the same thing as calling it somewhere within an async fn's call stack. But why ought we not call thread::sleep() in an async fn? Why should we avoid blocking in Tokio?

Blocking, in the abstract

Consider 2 possible implementations for the same task.

  • (Task 1) Block briefly, but overall complete more quickly.
  • (Task 2) Yield to the runtime sooner, but wait longer before completing.

Each of these tasks gets poll()ed 2 times. Once before waiting for a result, then once to return the result.

The Task 1 approach with 2 cores and 4 tasks to complete looks something like this over time: image showing a cpu over time with 2 activitiesimage showing a cpu over time with 2 activities If you squint, you can see all of the task completions are delayed past their natural completion time. The CPU is 100% subscribed, and you can't do any more work in this time window.

top or htop would show lower CPU utilization, because your threads are sleeping, or parked. But your runtime can't squeeze in any more work, because your threads are occupied.

If you instead use the Task 2 approach, here is the same window: image showing a cpu over time with 2 activitiesimage showing a cpu over time with 2 activities Even though Task 2 takes longer overall to complete, it is "well behaved" from a Tokio perspective. It yields control back up to the runtime thread quickly, and the thread is able to interleave work while it is waiting to be ready to complete.

top or htop would show similarly low CPU utilization here, because your threads are parking in the runtime between tasks. However, your runtime is available to take on more work in both cores. You can nearly squeeze in 2 more tasks in this same time window. By adding more tasks, you could drive your CPU much closer to 100% utilization doing useful work. Your throughput is much higher with the Task 2 approach!

A closer look at Future

The whole notion of async in Rust revolves around the Future trait and its single, humble function called poll(). Tokio calls poll() on each task in its queue to move it forward. Any async runtime in Rust does this.

A Future in Rust is a state machine. When poll() is called, it should try to move forward.It can return one of two possible values.

  • Poll::Done carries the value of the Future. The Future is complete - this is the return value of .await.
  • Poll::Pending is a solemn promise that the context.waker is set up to be .wake()'d when the Future can make progress again.

Implementing a Future

Sometimes you need to implement a Future yourself. It's not that frequent, but there are reasons. Even if you aren't implementing one yourself, it is important to understand how a Future works in Rust to get the most out of async fn.

Let's look at how we could make this loop yield to an asynchronous runtime after every addition. Maybe we're running on a computer where addition is expensive? I don't know, but we need something simple to do, so it's clear what is happening.

rust
let mut sum = 0;
for i in 0..=3 {
    sum += i;
}

or just

rust
let sum: u32 = (0..=3).sum();

Here is what a Future for that might look like:

rust
#[derive(Debug)]
enum MyFuture {
    Initial { step: usize },
    Processing { step: usize, sum: usize },
    Done,
}

It's just a struct or enum. It's convenient to make self an enum, but sometimes you want a struct. It's not really a Future though until you implement Future for it.

rust
impl Future for MyFuture {
    type Output = usize;

    fn poll(mut self: Pin<&mut Self>, context: &mut Context<'_>) -> Poll<usize> {
        match *self {
            Self::Initial { step } => todo!()
            Self::Processing {
                ref mut step,
                ref mut sum,
            } => todo!()
            _ => todo!(),
        }
    }
}

poll() can be called at any time, so your Future has to be able to determine if it is time to continue. To do the work, you fill in the match, based on whatever state you're in when poll()ed.

rust
match *self {
    Self::Initial { step } => {
        *self = Self::Processing { step, sum: 0 };
    }
    Self::Processing {
        ref mut step,
        ref mut sum,
    } => {
        if *step == 0 {
            // This is the end condition.
            let result = Poll::Ready(*sum);
            *self = Self::Done;
            return result;
        } else {
            // This is the loop arm. We haven't reached the end yet.
            *sum += *step;
            *step -= 1;
        }
    }
    // If we're Done, we shouldn't be polled anymore, but we shouldn't blow up if we are.
    _ => (),
}

You can see it's a little involved managing Future state yourself, but it's not too bad. You get &mut Self though, so you can do some very interesting things with Futures. Usually you just want to write async fn foo() instead and let the compiler manage all this, but we're looking at how it works; stick with it!

This match is missing some details if you use it as-is. That subtle Poll::Pending promise needs to be upheld! The simplest thing you can do to keep correctness is to just wake your context directly. That makes your task a "busy loop" and while it's better than missing wakes, it opts out of all optimizations around waiting task lists and burns CPU time. We don't have a waiting task list in this simple example though, so let's just put the wakes where they belong so you can see.

rust
impl Future for MyFuture {
    type Output = usize;
    fn poll(mut self: Pin<&mut Self>, context: &mut Context<'_>) -> Poll<usize> {
        match *self {
            Self::Initial { step } => {
                *self = Self::Processing { step, sum: 0 };
                context.waker().wake_by_ref();
            }
            Self::Processing {
                ref mut step,
                ref mut sum,
            } => {
                if *step == 0 {
                    let result = Poll::Ready(*sum);
                    *self = Self::Done;
                    return result;
                } else {
                    *sum += *step;
                    *step -= 1;
                    context.waker().wake_by_ref();
                }
            }
            _ => (),
        }
        Poll::Pending
    }
}

This is our complete Future implementation for MyFuture. It adds self.step to self.sum, then yields back to the runtime with a Poll::Pending. The promise that the context will be woken is upheld, because whenever we are in a valid, non-finished state we ensure the context is woken explicitly. Again, this makes this task a busy loop, but it does make it correct.

Okay we need a runtime to run this. Let's create our own!

rust
fn run_poll_like_a_runtime(future: &mut MyFuture) -> Poll<usize> {
    pin!(future).poll(&mut Context::from_waker(&futures::task::noop_waker()))
}

Okay, I lied. Let's just make a function for conveniently advancing the state of a MyFuture just like a real runtime.

Complete source
rust
use std::future::Future;
use std::pin::{pin, Pin};
use std::task::{Context, Poll};

fn main() {
    let mut future = MyFuture::Initial { step: 3 };
    println!("{future:?}");

    let status = run_poll_like_a_runtime(&mut future);
    println!("{future:?} -> {status:?}");

    let status = run_poll_like_a_runtime(&mut future);
    println!("{future:?} -> {status:?}");

    let status = run_poll_like_a_runtime(&mut future);
    println!("{future:?} -> {status:?}");

    let status = run_poll_like_a_runtime(&mut future);
    println!("{future:?} -> {status:?}");

    let status = run_poll_like_a_runtime(&mut future);
    println!("{future:?} -> {status:?}");
}

fn run_poll_like_a_runtime(future: &mut MyFuture) -> Poll<usize> {
    pin!(future).poll(&mut Context::from_waker(&futures::task::noop_waker()))
}

#[derive(Debug)]
enum MyFuture {
    Initial { step: usize },
    Processing { step: usize, sum: usize },
    Done,
}

impl Future for MyFuture {
    type Output = usize;
    fn poll(mut self: Pin<&mut Self>, context: &mut Context<'_>) -> Poll<usize> {
        match *self {
            Self::Initial { step } => {
                *self = Self::Processing { step, sum: 0 };
                context.waker().wake_by_ref();
            }
            Self::Processing {
                ref mut step,
                ref mut sum,
            } => {
                if *step == 0 {
                    let result = Poll::Ready(*sum);
                    *self = Self::Done;
                    return result;
                } else {
                    *sum += *step;
                    *step -= 1;
                    context.waker().wake_by_ref();
                }
            }
            _ => (),
        }
        Poll::Pending
    }
}
Playground - Run this Future and edit it in your browser

There are some subtleties with implementing a Rust Future. You need to figure out what state you're in, if you can progress, and you have to be certain that your Context will be woken up when there's more you can do if you're going to return Pending. Normally, you don't implement that part, instead calling things like tcp_stream.poll_read_ready(context) and relying on the TcpStream to implement that Pending promise. When the stream is Pending, you can just return Pending because it promised you that it would wake when there's something on the stream.

Rust Futures mix the simplicity of polling with the efficiency of readiness. Neat, huh?

Blocking Stars and Celebrities: What Do They Know? Do They Know Things? Let's Find Out!

Tokio

Tokio has some level of opinion about how long a poll() function should run. At the time of writing this, they prescribe 50µs as the default opinion for when poll() is slow. Once in a while, a slow poll doesn't really cause harm. But frequent slow polls wreck both your throughput and your tail latency. I usually hold 40µs as slow, but their default is fine.

Experts

Mara Bos has written a book on the practice of blocking.

Alice Rhyl has written about blocking, and has had a great influence on how I think about blocking in Tokio.

All of the perspectives and opinions on this blog are my own, and I'm only mentioning these resources so you can learn more. Neither of these inspirational developers are responsible for any of my errors in here.

Me

So, before getting too deep into practical territory, you should check those expert resources for further explanation and for correcting any unwitting lies I might tell you. You should not do what I say, but you might want to consider understanding it and factoring it into what you do. I have paid months for some of these opinions, and I would rather you get the benefit without paying the time I paid.

I've been working with Rust for a few years, writing public-facing web services that run in the tens of thousands of RPCs per second per CPU core on top of Tokio. I've had to implement my own synchronization primitives a few times now, and my libraries power these web services. My side projects, like rmemstore, are unburdened with Very Enterprise Concerns and run around 1mhz per CPU core on top of Tokio. This is not a brag, I am just trying to establish that I know how to use Tokio in web services from both a practical perspective and a best-case perspective.

Talking about blocking

Like Alice asks, "if you block [in poll()] for a short time, is it really blocking?" I humbly submit that it depends very much on what you're doing, and what short means. For me, my employer, my servers, my OS, and my projects the answer is "almost always, yes."

Let's think about std::sync::Mutex. It's awesome, and incredibly quick. If you haven't read it, it's not too hard to follow. It first does an atomic compare-and-swap. If it fails, it will spin a few times trying to grab it via that simple & cheap atomic. If it can't, it falls into lock_contended() which on Linux does a futex conditional park-and-wake dance. When the futex condition holds the thread, it takes 100-200µs to get the thread alive again from the operating system, at a minimum. That's true on the Linux aarch64 servers I typically operate anyway.

If you pretty much never have parallel callers to std::sync::Mutex, then you should not think twice about using it in your async functions.

But if you have 200,000 tasks every second spread across 14 cores trying to insert or clone a value out of a Mutex<HashMap<_, _>>, you will hit lock_contended with regularity. When you do, your task will experience a "slow poll" and fall into "Task 1" approach territory. As you add threads, this contention gets worse but you get to "buffer" more work at the lock_contended boundary. It's a mess if you add threads when you're experiencing contention, so really avoid doing it.

Sometimes people reach for tokio::sync::Mutex when they first see lock_contended and realize what it entails. That can be the right thing to do in some cases, but it's probably more rare than you think. The use of Tokio's mutex at really high task rates causes increased concurrent tasks for Tokio to schedule. That is not a problem until you start to bog it down, and the Tokio mutex makes it bewilderingly difficult to tell that throughput is low because of it.

When trying to get more throughput out of a server using h2 and lots of concurrent streams, I found that their locking strategy, at time of writing this, was too expensive. std::sync::Mutex is used in h2 to synchronize several steps in a typical gRPC call, each of which results in a new http2 stream. I wrote a Mutex that matched the std api so it could be an import swap, and optimized for spinning much harder & waking much more aggressively under the theory: If we're going to block a runtime thread on a server, we should release it as soon as physically possible. I forked h2, swapped the Mutex imports, and got another 20% throughput, but the threads were still contending even with this server-running-Tokio optimized Mutex. I've resorted to more extreme measures since.

Increasingly, I reach for the most purely non-blocking approaches possible in my hot paths. Not the approach with the most indirection, but the approach that has as little synchronization as possible (but no less). After all, no Mutex is faster than no Mutex. It bears mentioning that this is not a rationale for unsafe code; it is a rationale for thinking harder and writing faster, safe code with better approaches.

I'll write more about that another time, but I buried the lead above: You get &mut Self when you write a Future. That is a wonderful opportunity for safe, non-blocking code that runs as fast as your CPU... if you're ready to roll up your sleeves and write Futures.

2025-01-30