RESEARCH

MiniHack: A new sandbox for open-ended reinforcement learning

September 29, 2021

Reinforcement learning (RL) has become a valuable tool for solving problems of sequential decision-making, with research ranging from robotics to personalizing content to improving MRI scans.

Progress in RL is generally driven by simulation benchmarks, but established benchmarks (such as the Arcade Learning Environment and MuJoCo) are starting to saturate as researchers develop algorithms that perform near-optimally on these tasks. New benchmarks, such as ProcGen, Minecraft, and NetHack, will help the RL research community build powerful new algorithms, but it's difficult to disentangle exactly what kinds of problems are being tested in these complex and rich environments. Being composed of entire games, these testbeds are not explicitly designed for evaluating specific capabilities of RL agents, such as exploration, memory, and credit assignment. Ideally, practitioners should be able to define a vast universe of well-controlled tasks for specific research questions and easily adjust them by increasing their complexity and richness, without any arduous engineering.

To fill this gap, we’ve built MiniHack, an environment creation framework and accompanying suite of tasks that is based on NetHack, one of the hardest games in the world. With this tool, engineers can easily create a universe of tasks that challenge modern RL methods and are targeted at specific problems within RL.

MiniHack is now open source and available on GitHub. Researchers can use our detailed documentation to learn how to use MiniHack and get more details on the project in this NeurIPS 2021 paper.

Creating complicated problem-solving tasks with ease

MiniHack uses the NetHack Learning Environment (NLE) to provide a means for environment designers to easily tap into the richness of the game for complex RL tasks. This new sandbox comes with a large set of preexisting assets from the game, such as more than 500 monsters and 450 items, including weapons, wands, tools, and spell books, all of which feature unique characteristics and complex environment dynamics. This framework allows RL practitioners to go beyond simple grid-world-style navigation tasks with limited action spaces and instead take on more complicated skill-acquisition and problem-solving tasks.

To do this, MiniHack leverages the so-called description files that are used to describe the dungeons in NetHack. The description files are written using a human-readable probabilistic-programming-like domain-specific language (DSL). With just a few lines of code, people can generate a large variety of environments, controlling every little detail, from the location and types of monsters, to the traps, objects, and terrain of the level, all while introducing randomness that challenges generalization capabilities of RL agents.

The description files allow building diverse MiniHack environments within just a few lines of code.

The DSL has first-class support for underspecifying parts of the environment and using random generation functions. This means that each time the environment is reset and the agent starts a new episode, the level the agent appears in could differ significantly. This procedural content generation allows MiniHack to assess generalization capabilities of RL to previously unseen situations, thus enabling the training of agents that are more robust and general purpose in nature.

For researchers who don’t have time to learn the specifics of description files, we also provide a convenient interface to describe the entire environment in Python.

MiniHack environments

Screenshots of various MiniHack tasks.

Everything about MiniHack environments, which use the popular Gym interface, is highly customizable. Users can easily select what kinds of observations the agent receives, for instance pixel-based, symbolic, or textual, and what actions it can perform. In addition, we provide a convenient interface to specify the desired custom reward function that will guide the learning of the agent.

We also built a suite of RL tasks with MiniHack for testing the core capabilities of RL agents, and are releasing them as part of MiniHack. This suite of tasks can be used just like any other RL benchmark. Additionally, these tasks can also serve as building blocks for researchers wishing to develop new ones.

Pixel, symbolic, and textual observations in MiniHack.

MiniHack also enables the porting of existing grid-based benchmarks under one roof. We show how prior testbeds such as MiniGrid and Boxoban can be ported to MiniHack. Due to MiniHack’s flexibility and richness, these can be made more challenging by adding additional entities, environment features, and randomness.

Putting MiniHack to work

Creating rich and complex environments for investigating specific research questions in deep RL has never been easier.

MiniHack is targeted toward testing specific capabilities of AI agents in separation, including exploration, memory, and language-assisted RL. The framework can be used for the NetHack Challenge competition, which FAIR is coorganizing at NeurIPS 2021.

To get started with MiniHack, we are providing a variety of baselines using frameworks such as TorchBeast and RLlib. Furthermore, we are demonstrating how it is possible to use MiniHack for designing environments in an unsupervised fashion, using the recently proposed PAIRED algorithm as an example. We also provide baseline learning curves in Weights & Biases for our experiments. Overall, we believe MiniHack will enable researchers to iterate quickly on their ideas and to systematically increase the difficulty of benchmark tasks. To get started, check out MiniHack’s tutorials.

We’d like to acknowledge the contributions of Robert Kirk, PhD student at University College London.

Written By

Research Assistant

Research Scientist