The AtariVectorEnv interface supports asynchronous send/recv (see the ALE-Py vector environment docs are here). When a blocking step is called, typically this sends the action and waits for a new observation.

A blocking step() leaves the accelerator idle while the emulator advances

Accelerator CPU idle Choose action blocking .step() Update idle Emulator idle Emulator idle Logging obs action new obs 0 1 2 3 4 5 6 7 the accelerator stalls while the emulator is doing work

When the CPU is busy, the Accelerator is not, and vice versa.

Asynchronous communication

We don’t need to use a blocking step, we can do work while the emulator is stepping. env.send() and env.recv() allows for both the accelerator and CPU to not wait.

Async send/recv lets the accelerator work through the emulator step

Accelerator CPU idle Choose action Update Emulator idle Emulator Logging idle obs action new obs sent, used in next update 0 1 2 3 4 5 6 7 the accelerator keeps working while the emulator steps

How much does this help?

The benefit depends on how much accelerator work there is to overlap with the emulator. DQN updates every few steps on a replay batch, so send/recv hides that work behind the emulator and runs around 1.1 to 1.2 times faster than a blocking step(). PPO’s rollout batches into a single update — workload completed much faster than DQN — so PPO stays bound by the emulator and the two interfaces match.

Latency vs throughput as we increase the number of environments (Pong, frameskip 5)

blocking step() async send/recv
DQN PPO