Self-Taught AlphaGo Zero 'Strongest Go Player in History'


"A few days in, it rediscovers known best plays, and in the final days goes beyond those plays to find something even better", Hassabis says. Unlike the earlier versions of AlphaGo which learnt how to play the game using thousands of human amateur and professional games, AlphaGo Zero learnt to play the game of Go simply by playing games against itself, starting from completely random play. "We're quite excited because we think this is now good enough to make some real progress on some real problems". By March that year AlphaGo had beaten human champion Lee Se-dol - a feat previously thought impossible for computer systems given the complexity of Go compared to the more computer-friendly game of chess. The machine learned from some 30 million moves in games played by human experts to the point where he could anticipate the opponent's move with an accuracy of 57 percent of the time. That the team could build such an algorithm that surpassed previous versions using less training time and computer power "is nothing short of amazing", he adds.

Go is a complex ancient East Asian strategy game, played on a 19-by-19 grid.

To accomplish this, the company used a machine learning technique called "reinforcement learning" to push Zero to optimize its gameplay. For each turn, AlphaGo Zero drew on its past experience to predict the most likely ways the rest of the game could play out, judge which player would win in each scenario and choose its move accordingly.

By the 40th day, AlphaGo Zero could already beat the best iteration of the original AlphaGo 100-0.

After just 40 days, AlphaGo Zero did the unthinkable and achieved a success rate of 90pc over the original AlphaGo.

The program hasn't mastered Go without human knowledge, he says, because "actually prior knowledge has gone into the construction of the algorithm itself". While AlphaGo's effectiveness in human games and against itself has shown that there's room for AI to surpass our capacity in tasks that we think are far too hard, the robot overlords aren't here yet.

More news: North Korea is threatening the United States of America to make "an incredible impact"

The latest iteration, however, differs from its predecessors: AlphaGo Zero abandons all hand-engineered features, runs only one neural network (versus the two found in earlier models), and relies exclusively on its own knowledge to evaluate positions. What's novel about AlphaGo Zero is that instead of just running the tree search and making a move, it remembers the outcome of the tree search - and eventually of the game. This required many machines over several months, and used 48 specialised chips for neural network training, called TPUs. Instead of exploring possible outcomes from each position, it simply asks the network to predict a victor.

It has been reported that AlphaGo Zero won 100 out of 100 matches against AlphaGo Lee after self-learning for 72 hours. "We'd much rather trust the predictions of that one strong expert". This makes the algorithm both stronger and more efficient. But its predecessors used ten times that number.

Prof David Silver, of London based Deep Mind, said: "What we are most excited about is how far it can go in the real world".

He said: "A lot of the AlphaGo team are now moving onto other projects to try and apply this technology to other domains". One promising area, he suggested, is understanding how proteins fold, an essential tool for drug discovery. The puzzle shares some key features with Go, however.

In their study, the researchers describe the program using a term that is well-known to students of philosophy: Tabula rasa, which is Latin for "blank slate". In the longer term, such algorithms might be applied to similar tasks in quantum chemistry, materials design and robotics.