The NetHack Learning Environment to advance deep reinforcement learning

June 25, 2020

What the research is:

The NetHack Learning Environment is a novel research environment for testing the robustness and systematic generalization of reinforcement learning (RL) agents. The environment is based on NetHack, one of the oldest and most popular procedurally generated roguelike games. Existing RL environments are either sufficiently complex or based on fast simulation, but they are rarely both. In contrast, the NetHack Learning Environment combines lightning-fast simulation with very complex game dynamics that are difficult even for humans to master. This allows our agents to experience billions of steps in the environment in a reasonable time frame while still challenging the limits of what current methods can achieve, driving long-term research on topics such as exploration, planning, skill acquisition, and language-conditioned RL.

The complexity of NetHack is evident in the hundreds of items and monster types and the rich interactions between these, the player, and the environment. Yet the agent has a clearly defined goal: to descend more than 50 deadly dungeon levels to retrieve an amulet before ascending to demigodhood. Since levels in NetHack are procedurally generated, every game is different, testing the generalization limits of current state-of-the-art approaches. In order to master the game, even human players often have to consult external resources such as the NetHack Wiki to identify critical strategies or discover new paths forward. We believe this makes NetHack an exciting environment for research beyond tabula rasa RL.

Annotated example of an agent in NetHack.

All this complexity is captured in a turn-based grid world presented in ASCII-art by a game engine written primarily in the C programming language. This extremely lightweight simulation, forgoing all but the simplest physics while rendering symbols instead of pixels, allows our models to learn very quickly in this environment without wasting computational resources on simulating game dynamics or rendering observations — neither of which is critical to challenging the fundamental skills of a learning agent.

How it works:

The environment consists of three components: a Python interface to NetHack using the popular OpenAI Gym API; a suite of benchmark tasks; and a distributed deep RL baseline agent based on TorchBeast, a PyTorch implementation of IMPALA.

Since we believe that the overall goal of solving NetHack is still out of reach for the foreseeable future, we defined seven benchmark tasks to allow us to measure progress:

  • staircase: descend to lower levels of the dungeon

  • pet: take care of your pet (keep it alive and take it with you deeper into the dungeon)

  • eat: find sources of nonpoisonous food and eat it, to avoid starving

  • gold: collect gold throughout the dungeon

  • scout: see as much of the dungeon as you can

  • score: achieve high in-game score (e.g., killing monsters, descending, collecting gold)

  • oracle: reach an important landmark, the Oracle (appears 4–9 levels into the dungeon)

These are only proxy tasks that we use to measure the current abilities of our models, and many more can be defined easily.

Overview of the baseline model released with the NetHack Learning Environment.

Using a single high-end GPU, one can train agents for hundreds of millions of environment steps a day using the TorchBeast framework, which supports further scaling by adding additional GPUs or machines. This provides the agent with plenty of experience to learn from so that we as researchers can spend more time testing new ideas instead of waiting for results to come in. In addition, we believe it democratizes access for researchers in more resource-constrained labs without sacrificing the difficulty and richness of the environment.

Our baseline agent implementation is based on a recurrent policy that encodes various parts of the observation space in NetHack. NetHack also contains a large body of external resources that could be used in future research to improve the performance of agents in the game. For example, large repositories of replay data from human players exist, and a model could learn directly from them. There are also many resources that human players consult in order to improve their gameplay, including the NetHack Guidebook, released with the game; the NetHack Wiki, maintained by the player community; and a host of online videos and forum discussions. Effectively utilizing any of these resources is an open question for RL researchers looking to improve the performance and sample efficiency of their models using external knowledge sources.

Something Went Wrong
We're having trouble playing this video.

A Reinforcement Learning agent explores the Gnomish Mines in NetHack.

Why it matters:

NetHack presents a challenge that’s on the frontier of current methods, without the computational costs of other challenging simulation environments. Standard deep RL agents currently operating on NetHack explore only a fraction of the overall game of NetHack. Progress in this challenging new environment will require RL agents to move beyond tabula rasa learning, for example, by investigating synergies with natural language understanding to utilize information on the NetHack Wiki. We believe that the NetHack Learning environment will inspire further research on robust exploration strategies in RL, planning with long-term horizons, and transferring commonsense knowledge from resources outside of the simulation.

Read the full paper:

The NetHack Learning Environment