A novel approach to cooperative multi-agent reinforcement learning (RL) that assigns tasks to individual agents within a group, thereby improving the entire group’s ability to collaborate. We tested this method in the real-time strategy game StarCraft®: Brood War®, and found that our RL-trained model significantly outperformed computer-controlled players that relied on carefully tuned rule-based baselines. Perhaps most important, these gains carried over to matches with significantly larger armies than what we included in our training scenarios. We’re releasing the source code for this approach on our TorchCraftAI GitHub repository, and detailing our results, which indicate that treating collaborative multi-agent RL as a dynamic assignment problem can lead to groups of agents that are better at generalizing to more complex situations.
Our approach focuses on multi-agent collaborative (MAC) problems where agents have to carry out multiple intermediate tasks in order to accomplish a larger one. As the number of agents and tasks in these kinds of MAC problems increases, the complexity grows exponentially, which prevents systems from learning directly from large-scale scenarios. Systems must instead generalize from smaller scenarios and tackle tasks that were not part of their RL-based trial-and-error training runs. Since RL-trained systems often struggle with this exact type of generalization, our approach breaks MAC policies down into high- and low-level policies. The high-level policies determine which agents should be assigned specific tasks. To encourage collaboration between agents, we employ a quadratic cost function that optimizes for long-term performance and adjusts those high-level assignments to follow the most efficient coordination patterns. Once agents have been assigned, they execute their tasks based on a fixed, low-level policy, which determines the specific actions needed to carry them out.
We tested this method by addressing the problem of target selection in StarCraft®: Brood War®. When two groups of units fight, each unit must choose which member of the enemy group to attack. Depending on the situation, the agents learn to either focus their attacks on a handful of units or spread damage across a larger number of weak targets. Our agents also learn to exploit the movement patterns, including delaying their attacks until opposing units are close, effectively splitting the enemy’s forces while the agents maintain a more tightly knit and effective formation. But in addition to performing well — in some configurations beating the rules-based opponents roughly 99 percent of the time — these RL-based systems demonstrated impressive generalization, succeeding in battles that involved five times more units than what they had encountered during training.
Building agents that can collaborate effectively is important for a wide variety of problems. For example, a team of robots might need to work together to explore an unfamiliar space. In addition to helping solve these challenges, our method addresses the broader challenge within RL of developing training techniques that can adapt to new circumstances. This approach demonstrates that the lessons learned from small problems can be applied to significantly larger ones, which could have implications for a wide range of training techniques for AI systems.
The source code for this work is available on GitHub. And those attending NeurIPS 2019 can learn more about this research at a spotlight talk and a poster session (poster #194) — starting at 4:35 p.m. and 5:30 p.m. local time, respectively — on Tuesday, December 10.