Teaching AI to plan using language in a new open-source strategy game

September 06, 2019

Written byMichael Lewis, Denis Yarats and Hengyuan Hu

Written by

Michael Lewis, Denis Yarats and Hengyuan Hu

Share

When humans face a complex challenge, we create a plan composed of individual, related steps. Often, these plans are formed as natural language sentences. This approach enables us to achieve our goal and also adapt to new challenges, because we can leverage elements of previous plans to tackle new tasks, rather than starting from scratch each time.

Facebook AI has developed a new method of teaching AI to plan effectively, using natural language to break down complex problems into high-level plans and lower-level actions. Our system innovates by using two AI models — one that gives instructions in natural language and one that interprets and executes them — and it takes advantage of the structure in natural language in order to address unfamiliar tasks and situations. We’ve tested our approach using a new real-time strategy game called MiniRTSv2, and found it outperforms AI systems that simply try to directly imitate human gameplay.

We’re now sharing our results which will be presented at NeurIPS 2019 later this year, and open-sourcing MiniRTSv2 so other researchers can use it to build and test their own imitation and reinforcement learning systems.

Previously, the AI research community has found it challenging to bring this hierarchical decision-making process to AI systems. Doing so has meant that researchers had to manually specify how to break down a problem into macro-actions, which is difficult to scale and requires expertise. Alternatively, if the AI system has been trained to focus on the end task, it is likely to learn how to achieve success through a single composite action rather than a hierarchy of steps. Our work with MiniRTSv2 shows that a different natural language-based method can make progress against these challenges.

While this is foundational research, it suggests that by using language to represent plans, these systems can more efficiently generalize to a variety of tasks and adapt to new circumstances. We believe this can bring us closer to our long-term goal of building AI that can adapt and generalize in real-world settings.

Building MiniRTSv2, an open source, NLP-ready game environment

MiniRTSv2 is a streamlined strategy game designed specifically for AI research. In the game, a player commands archers, dragons, and other units in order to defeat an opponent.

In this sample MiniRTSv2 gameplay — recorded directly from the tool’s interface — all the instructions that appear below the map field are generated by an instructor model, while the corresponding in-game actions, such as building and attacking units, are carried out by a separate executor model.

Though MiniRTSv2 is intentionally simpler and easier to learn than commercial games such as DOTA 2 and StarCraft, it still allows for complex strategies that must account for large state and action spaces, imperfect information (areas of the map are hidden when friendly units aren’t nearby), and the need to adapt strategies to the opponent’s actions. Used as a training tool for AI, the game can help agents learn effective planning skills, whether through NLP-based techniques or other kinds of training, such as reinforcement and imitation learning.

Using language to generate high-level plans and assign low-level instructions

We used MiniRTSv2 to train AI agents to first express a high-level strategic plan as natural language instructions and then to act on that plan with the appropriate sequence of low-level actions in the game environment. This approach leverages natural language’s built-in benefits for learning to generalize to new tasks. Those include the expressive nature of language — different combinations of words can represent virtually any concept or action — as well as its compositional structure, which allows people to combine and rearrange words to create new sentences that others can then understand. We applied these features to the entire process of planning and execution, from the generation of strategy and instructions to the interface that bridges the different parts of the system’s hierarchical structure.

Our AI agent plays a real-time strategy game using two models. The instructor creates plans based on continually observing the game state and issues instructions in natural language to the executor. The executor grounds these instructions as actions, based on the current state of the game.

The AI agent that we built to test this approach consists of a two-level hierarchy — an instructor model that decides on a course of action and issues commands, and an executor model that carries out those instructions. We trained both models using a data set collected from human participants playing MiniRTSv2.

Those participants worked in instructor-executor pairs, with designated instructors issuing orders in the form of written text, and executors accessing the game’s controls to carry those orders out. The commands ranged from clear-cut directives, such as “build 1 dragon,” to general instructions, such as “attack.” We used these natural language interactions between players to generate a data set of 76,000 pairs of instructions and executions across 5,392 games

Leveraging the versatility of natural language to learn more generalized plans

Though MiniRTSv2 isn’t designed solely for NLP-related work, the game environment’s text interface allows us to explore ambiguous and context-dependent linguistic features that are relevant to building more versatile AI. For example, given the instruction “make two more cavalry and send them over with the other ones,” the executor model has to grasp that “the other ones” are existing cavalry, an inference that’s simple for most humans, but potentially challenging for AI. The agent also has to account for the kind of potentially confusing nuances that are common in natural language. The specific command “send idle peasant to mine mineral” should lead to the same action as the comparatively vague “back to mine,” which doesn’t specify which units should be moved.

At each time step within a given MiniRTSv2 game, our system relies on three encoders to turn inputs into feature vectors that the model can use. The observation encoder focuses on spatial inputs (where game objects appear on the map) and nonspatial inputs (such as the type of unit or building that a given game object represents); the instruction encoder generates vectors from a recent list of natural language instructions; and the auxiliary encoder learns vectors for the remaining global game attributes (such as the total amount of resources a player has).

But rather than clarifying phrasing or eliminating redundant permutations of the same order, we intentionally leave the human instruction examples (and corresponding executor actions) as they were delivered. The instructor model can’t formulate original sentences and has to select from examples from human play-throughs. This forces the agent to develop pragmatic inference, learning how to plan and execute based on natural language as humans actually use it, even when that usage is imprecise.

Training our system to not only generate latent language commands but also understand the context of those instructions resulted in a significant boost in performance over more traditional agents. Using MiniRTSv2, we pitted a number of different agents against an AI opponent that was trained to directly imitate human actions, without taking language into account. The results from these experiments showed that language consistently improved agents’ win rates. For example, our most sophisticated NLP-based agent, which uses a recurrent neural network (RNN) encoder to help differentiate similar orders, beat the non-language-based AI opponent 57.9 percent of the time. That’s substantially better than the imitation-based agent’s 41.2 percent win rate.

This is the first model to show improvements in planning by generating and executing latent natural language instructions. And though we employed a video game to evaluate our agents, the implications of this work go far beyond boosting the skills of game-playing AI bots, suggesting the long-term potential of employing language to improve generalization. Our evaluations showed that performance gains for NLP-based agents increased with larger instruction sets, as the models were able to use the compositional structure within language to better generalize across a wide range of examples.

And in addition to improving generalization, this approach has the significant side benefit of demonstrating how decision-making AI systems can be simultaneously high performance, versatile, and more interpretable. If an agent’s planning process is based on natural language, with sentences mapped directly to actions, understanding how a system arrived at a given action could be as simple as reading its internal transcript. The ability to quickly vet an AI’s behavior could be particularly useful for AI assistants, potentially allowing a user to fine-tune the system’s future actions.

Building language-based AI assistants through open science and collaboration

While our results have focused on using language as an aid for hierarchical decision-making, improving the ability of AI systems to utilize and understand natural language could pave the way for an even wider range of potential long-term benefits, such as assistants that are better at adapting to unfamiliar tasks and surroundings. Progress in this area might also yield systems that respond better to spoken or written commands, making devices and platforms more accessible to people who aren’t able to operate a touchscreen or mouse.

As promising as our results have been, the experimental task that we’re presenting, the NLP-based data set that we’ve created, and the MiniRTSv2 environment that we’ve updated are all novel contributions to the field. Exploring their full potential will require a substantial collective effort, which is why we’re inviting the wider AI community to use them. And these resources aren’t limited to one task — for example, since the MiniRTSv2 interface makes it easy to isolate the language activity from the recorded games, our data set of sample commands could be valuable for researchers training NLP systems, even if their work is unrelated to game performance or hierarchical decision-making. We look forward to seeing the results and insights that other researchers generate using these tools, as we continue to advance the application of language to improve the quality, versatility, and transparency of AI decision-making.

Written by

Mike Lewis

Research Scientist

Denis Yarats

Research Engineer

Hengyuan Hu

Research Engineer