Multi-ROM, Option execution, and interface for the Torch compiler

The Gymnasium Arcade Learning Environment vector wrapper spawns a separate Python process per game. Running a full benchmark sweep — multiple ROMs with multiple seeds — was inefficient as these processes kept separate copies of the ROM in CPU cache, reducing the number of ROMs that could be run in parallel. Two projects were the alternative to the gymnasium vector interface: EnvPool and Nvidia CuLE, to speed up the rate of experimentation on the ALE.

ALE-Py has recently added a native C++ vector implementation to support standard preprocessing, asynchronous send/recv between the agent and the environment, and spawning multiple instances of the same ROM, see their docs here and the example below:

from ale_py.vector_env import AtariVectorEnv

# Create a vector environment with 4 parallel instances of Breakout
envs = AtariVectorEnv(
    game="breakout",  # The ROM id not name, i.e., camel case compared to `gymnasium.make` name versions
    num_envs=4,
)

This blog post details my PR to support three new features.

Support for multiple ROMs

AtariVectorEnv now accepts a list of game names. num_envs is inferred from the list length. Each ROM retains its own episode state and autoresets independently — ROMs terminate/truncate independantly. Here is how you setup multiple ROMs at once:

import numpy as np

runs_per_rom = 2
games = np.repeat(["pong", "breakout", "space_invaders"], runs_per_rom).tolist()

envs = AtariVectorEnv(games=games) 
# ["pong", "pong", "breakout", "breakout", "space_invaders", "space_invaders"]

If full_action_space: bool = False, in the constructor of AtariVectorEnv is kept false, the action space per ROM is different. When running different ROMs, single_action_space is None and action_space is a MultiDiscrete with per-ROM counts, see the gymnasium spaces documentation here.

assert envs.single_action_space is None
assert isinstance(envs.action_space, gym.spaces.MultiDiscrete)
print(envs.action_space.nvec)  # [4, 6, 6, 6] - action space size per ROM

# alternatively
print(envs.num_actions)  # [4, 6, 6, 6]

The chart below shows throughput versus latency using the multi-rom feature on an AMD Ryzen 9 with 12 cores. A single ROM steps at ~3.7k steps/sec; packing all runs into one environment pushes that past 50k.

Latency vs throughput as we increase the number of ROMs

ALE-Py AtariVectorEnv Gymnasium AsyncVectorEnv

PyTorch custom ops

env.torch() returns a set of custom ops that write observations directly into pre-allocated GPU tensors. No Python-side allocation occurs in the hot path and the ops are compatible with torch.compile. The split send/recv form lets you overlap agent computation with environment stepping.

handle_id, ale_send, ale_step, ale_recv, get_last_info, unregister = envs.torch()

ale_send(handle_id, actions)
# run agent inference here while envs step
obs, reward, term, trunc, steps_taken = ale_recv(handle_id)

Multi-step action sequences

Passing a list of arrays to step or send dispatches a sequence of primitive actions to each ROM in one call, avoiding per-step Python-to-C++ round trips. Sequence lengths can differ across ROMs. gamma applies discount accumulation across the sequence. An empty array skips a ROM entirely, returning its last observation with zero reward.

sequences = [
    np.array([0, 2, 1, 0]),  # pong: 4 actions
    np.array([3, 1]),         # breakout: 2 actions
]
obs, reward, term, trunc, info = envs.step(sequences, gamma=0.99)
print(info["steps_taken"])   # actions executed before termination

This supports macro-actions (repeated actions) or open-loop options (arbitrary sequences of actions): the agent dispatches a sequence of primitive actions and receives a single discounted return.