February 13, 2019
Written byYuandong Tian, Jerry Ma, Qucheng Gong, Shubho Sengupta, Zhuoyuan Chen, James Pinkerton, Larry Zitnick
Yuandong Tian, Jerry Ma, Qucheng Gong, Shubho Sengupta, Zhuoyuan Chen, James Pinkerton, Larry Zitnick
Since we released ELF OpenGo last year, AI researchers have used the game-playing bot to better understand how AI systems learn, and Go enthusiasts have tested their skills against it as a new state-of-the-art artificial sparring partner. The open source bot has performed extremely well against humans — including a 20-0 record against top professional Go players — and has been widely adopted by the AI research community to run their own Go experiments and reproduce others' results. ELF OpenGo has faced off against multiple modified versions of itself in AI-based Go tournaments. It has also played alongside humans, including as part of a U.S. Go Congress exhibition featuring mixed pairs — each with one person and one ELF OpenGo system working together against another AI-human team.
The Facebook AI Research (FAIR) team is now announcing new features and research results related to ELF OpenGo, including an updated model that was retrained from scratch. We're also releasing a Windows executable version of the bot, making it easier for Go players to use the system as a training aid, as well as a unique archive that shows ELF OpenGo's analysis of 87,000 professional Go games. Present-day players can see how our system ranks the best pro players dating back to the 18th century, assessing their performance in detail, down to individual moves in specific games. We're excited that our development of this versatile platform is helping researchers better understand AI, and we're gratified to see players in the Go community use it to hone their skills and study the game.
"I can definitely say that the ELF OpenGo project has brought a huge impact on the Korean Go community,” said Beomgeun Cho, Assistant Director of PR, Korea Baduk Association. “Since it came out, almost every competitive professional player in Korea has been using the ELF Go program to analyze their own and other players’ games. And because of that, not only has the level of Korean Go improved, but the level of the whole world has been improved significantly.”
When DeepMind published the results of its AlphaGo Zero bot in 2017, it demonstrated how useful the 4,000-year-old game of Go could be as a test bed for research related to deep reinforcement learning (RL). Due to its high-branching factors, convoluted interactions, and complicated patterns, effective Go bots must generalize to unseen and complicated situations, exploring and discovering new strategies. It provides an environment with millions of potential move combinations, but no hidden or chance-based game mechanics (such as rolling dice or shuffling playing cards). But while AlphaGo Zero and its successor, AlphaZero, have proved that AI systems can be trained to consistently beat human Go players, they function more as an aspirational example of deep RL than as a tool for the wider AI research community.
As part of our commitment to open science, we released a reimplementation of AlphaZero last year, enabling other research labs to gain greater insight into the details for how these approaches work. The open-sourcing of our models also provides an essential benchmark for future research. We recognize that most researchers will not be able to reproduce our results even with the open sourced code due to the significant computational resources required. That's why we're sharing our insights based on retraining ELF OpenGo from scratch in a new paper. This work sheds new light on why AI is so formidable against human players, and it also clarifies the technology's limitations, which could help researchers better understand the underlying mechanism and apply it to other situations.
For the research community, our newly updated model and code represent the best version of ELF OpenGo yet, and by releasing our data set of 20 million self-play games and the 1,500 intermediate models used to generate them, we're further reducing the need for compute resources (self play being the most hardware-intensive component in the training process). And for researchers who want to dig deeper into how RL-based Go bots learn and play, our paper details the results of extensive ablation experiments, modifying individual features during evaluation to better understand the properties of these kinds of algorithms.
The key to ELF OpenGo's strong performance is that it doesn't learn like humans do. The trial-and-error nature of deep RL — where systems explore different moves, get both failure and success cases, and learn from them to take actions that lead to the latter — might resemble human learning in a general sense, but the specific mechanics are very different. For example, ELF OpenGo may only learn from its knowledge that it won or lost a game that it played against itself. It doesn't know which particular moves had the greatest impact on whether it won or lost. Unlike human players, ELF OpenGo does not get advice from more skilled players pointing out good or bad moves, or have the opportunity to play against players better than itself. Our final model is the result of playing 20 million self-play games.
When we used our model to analyze games played by professional human players, its ability to predict their moves plateaued very early in its learning process, after less than 10 percent of the total training time. But as the model continued to train, its skill level kept increasing, ultimately beating our earlier, prototype ELF OpenGo model 60 percent of the time. That prototype system was already outperforming human experts, having achieved a 20-0 record against four Go professionals ranked among the top 30 players in the world. ELF OpenGo further verifies AlphaZero's previous finding that many human-played moves — even among the best professionals — are suboptimal.
But just as it's a mistake to overstate superhuman AI performance in other domains, our exploration of ELF OpenGo's learning process reveals important limitations that are specific to deep RL. Like AlphaZero, our system never fully masters the concept of “ladders,” a common technique typically understood by the beginners in Go in which one player traps the other's stones in a long formation that stretches diagonally across the board (with the captured stones resembling a ladder's rungs). This kind of move relies more on anticipation than do many other sequences. While looking 30 or more moves into the future is commonplace for human Go players, DeepMind noted that these moves were learned late in the training process.
In this graphic, Black tries to execute a “ladder,” but White is able to escape. Human players quickly learn the ladders pattern, but bots learn much more slowly and are unable to generalize from individual examples of ladders.
To further investigate this weakness, we curated a data set of 100 ladder scenarios and evaluated ELF OpenGo's performance with them. It is likely that under the current model design, the scenario is learned through brute force (i.e., each extra length of ladder requires extra training) rather than as a pattern that the system can generalize to unseen situations. ELF OpenGo relies on a technique called Monte Carlo Tree Search (MCTS) for looking at moves in the future. Humans quickly understand that ladders lead to a very specific sequence of moves and can quickly analyze the final result. MCTS is a probabilistic approach, meaning that even if each individual correct move has high probability, the probability of picking all the correct moves in a long sequence is low.
More broadly, ELF OpenGo enables other AI researchers to get firsthand experience on how these systems work. This can help the community improve its theoretical understanding of the training procedure, discover new weaknesses in these algorithms, and eventually achieve better performance with less computational power.
Interestingly, ELF OpenGo learns in the opposite direction of human players, with its RL-based approach focusing more on the later stages of games than on the opening or middle sections. By incentivizing moves that lead to victory, RL pushes ELF OpenGo to learn more about how games end than how they begin. People, meanwhile, tend to assess their situation from the present, focusing on near-term and local rewards while still extrapolating forward. Though our findings are related specifically to Go, this indicates a broader limitation with RL, which could lead to performance that is impressive overall but that could fail — or be exploited — if it's too focused on final outcomes over near-term success.
In the process of retraining and implementing ELF OpenGo, we realized that it could function not only as a present-day AI player but also as a window into the past four centuries of competitive Go games. Why not reveal ELF OpenGo's specific analysis of those games and players?
This Go game board shows the “ear-reddening” move made by Honinbo Shusaku, a 19th-century professional Go player in Japan. Shusaku's famous move is shown as “a”, while ELF OpenGo concludes with high confidence that “b!” is the best play. Stone 126 is the most recent stone played.
The result of this realization is an interactive tool based on ELF OpenGo's analysis of 87,000 games played by humans. That data set spans 1700 to 2018, with our system evaluating the quality of individual moves based on the agreement between the moves predicted by the bot and the human players. Though the tool encourages deep dives into specific matches, it also highlights important trends in Go. In analyzing games played over that period of more than 300 years, the bot found average strength of play has improved fairly steadily. Other measures, such as the worst move made during a game — meaning the move associated with the largest drop in the probability of winning — repeatedly improve and worsen through history, according to ELF OpenGo, with the late 1800s and 2000s showing the best performance. We may also analyze individual players. Honinbo Shusaku, perhaps the most famous Go player in history, shows different trends in comparison with ELF OpenGo, depending on the stage of gameplay. His early gameplay diverged from ELF OpenGo through time, while his middle gameplay became more consistent. We can also analyze Honinbo Shusaku's famous “ear-reddening” move played when he was 17 against Gennan Inseki, a much more established Go player. It turns out that ELF OpenGo preferred another move instead.
Percentage of midgame moves (moves 60 to 120) played by professional Go players that agree with (or “match”) those recommended by ELF OpenGo for games played from 1700 to 2018.
The largest drop in winning probability in the game, from the worst move made by professional Go players (averaged over games from 1700 to 2018; lower is better).
Percentage of early game moves played by Honinbo Shusaku, a professional Go player in 19th-century Japan, that agree with those recommended by ELF OpenGo.
Percentage of midgame moves played by Honinbo Shusaku that agree with those recommended by ELF OpenGo.
ELF OpenGo also highlights the apparent impact of AI on the game. For example, ELF OpenGo's rate of agreement tends to increase over time, suggesting that the general quality of play has become cumulatively better.
Our system's evaluation of specific players also tends to increase over time, indicating their own improvement as their career progresses. These observations might seem obvious in hindsight, but ELF OpenGo quantifies these progressions and pinpoints individual games and years where changes in performance are noticeable. The sudden overall increase in agreement in 2016 also reinforces the belief that the introduction of powerful AI opponents has boosted the skills of professional players. That apparent correlation isn't conclusive — it's possible that humans have gotten markedly better for some other reason — but it's an example of how a system trained to carry out a given task can also provide wide-ranging analysis of a larger domain, both in the present and from a historical perspective.
Though ELF OpenGo is already being used by research teams and players around the world, we're excited to expand last year's release into a broader suite of open source resources. For Go enthusiasts, our system's analysis of professional games functions as a new kind of training aid, offering a superhuman AI player's take on the changing state of the game. We've also increased access to the bot itself for training purposes, with a Windows executable version that Go players can download and play against.
But there's more work to be done, both with ELF OpenGo and in the larger project of developing AI that can learn as efficiently as humans can. Our system is able to beat human experts but only after playing millions of games against itself. How do people learn from a fraction of that many examples while also picking up concepts such as ladders more quickly and ultimately better? By making our tools and analysis fully available, we hope to accelerate the AI community's pursuit of answers to these questions.
Research Engineering Manager