September 29, 2021
Progress in RL is generally driven by simulation benchmarks, but established benchmarks (such as the Arcade Learning Environment and MuJoCo) are starting to saturate as researchers develop algorithms that perform near-optimally on these tasks. New benchmarks, such as ProcGen, Minecraft, and NetHack, will help the RL research community build powerful new algorithms, but it's difficult to disentangle exactly what kinds of problems are being tested in these complex and rich environments. Being composed of entire games, these testbeds are not explicitly designed for evaluating specific capabilities of RL agents, such as exploration, memory, and credit assignment. Ideally, practitioners should be able to define a vast universe of well-controlled tasks for specific research questions and easily adjust them by increasing their complexity and richness, without any arduous engineering.
To fill this gap, we’ve built MiniHack, an environment creation framework and accompanying suite of tasks that is based on NetHack, one of the hardest games in the world. With this tool, engineers can easily create a universe of tasks that challenge modern RL methods and are targeted at specific problems within RL.
MiniHack uses the NetHack Learning Environment (NLE) to provide a means for environment designers to easily tap into the richness of the game for complex RL tasks. This new sandbox comes with a large set of preexisting assets from the game, such as more than 500 monsters and 450 items, including weapons, wands, tools, and spell books, all of which feature unique characteristics and complex environment dynamics. This framework allows RL practitioners to go beyond simple grid-world-style navigation tasks with limited action spaces and instead take on more complicated skill-acquisition and problem-solving tasks.
To do this, MiniHack leverages the so-called description files that are used to describe the dungeons in NetHack. The description files are written using a human-readable probabilistic-programming-like domain-specific language (DSL). With just a few lines of code, people can generate a large variety of environments, controlling every little detail, from the location and types of monsters, to the traps, objects, and terrain of the level, all while introducing randomness that challenges generalization capabilities of RL agents.
The DSL has first-class support for underspecifying parts of the environment and using random generation functions. This means that each time the environment is reset and the agent starts a new episode, the level the agent appears in could differ significantly. This procedural content generation allows MiniHack to assess generalization capabilities of RL to previously unseen situations, thus enabling the training of agents that are more robust and general purpose in nature.
For researchers who don’t have time to learn the specifics of description files, we also provide a convenient interface to describe the entire environment in Python.
Everything about MiniHack environments, which use the popular Gym interface, is highly customizable. Users can easily select what kinds of observations the agent receives, for instance pixel-based, symbolic, or textual, and what actions it can perform. In addition, we provide a convenient interface to specify the desired custom reward function that will guide the learning of the agent.
We also built a suite of RL tasks with MiniHack for testing the core capabilities of RL agents, and are releasing them as part of MiniHack. This suite of tasks can be used just like any other RL benchmark. Additionally, these tasks can also serve as building blocks for researchers wishing to develop new ones.
MiniHack also enables the porting of existing grid-based benchmarks under one roof. We show how prior testbeds such as MiniGrid and Boxoban can be ported to MiniHack. Due to MiniHack’s flexibility and richness, these can be made more challenging by adding additional entities, environment features, and randomness.
Creating rich and complex environments for investigating specific research questions in deep RL has never been easier.
MiniHack is targeted toward testing specific capabilities of AI agents in separation, including exploration, memory, and language-assisted RL. The framework can be used for the NetHack Challenge competition, which FAIR is coorganizing at NeurIPS 2021.
To get started with MiniHack, we are providing a variety of baselines using frameworks such as TorchBeast and RLlib. Furthermore, we are demonstrating how it is possible to use MiniHack for designing environments in an unsupervised fashion, using the recently proposed PAIRED algorithm as an example. We also provide baseline learning curves in Weights & Biases for our experiments. Overall, we believe MiniHack will enable researchers to iterate quickly on their ideas and to systematically increase the difficulty of benchmark tasks. To get started, check out MiniHack’s tutorials.
MiniHack is open source and available on GitHub.
Learn how to use MiniHack using our detailed documentation.
Check out the MiniHack NeurIPS 2021 paper
We’d like to acknowledge the contributions of Robert Kirk, PhD student at University College London.