Techmate: How AI rewrote the rules of chess

Chess strategy has evolved considerably from the days of its first official champion, Wilhelm Steinitz, in 1886, to its latest, the Norwegian grandmaster Magnus Carlsen. But throughout there has been a constant: the number and value of a players’ pieces — known as “material” — have been key. That war-of-attrition thinking has been underlined since computers, with their ability to churn through millions of options to find a chink in an opponent’s defences, took over from humans as the best chess players almost two decades ago.

But late last year in a battle masterminded from an airy new office block in London’s King’s Cross, a chess program with a highly unconventional view of the game turned the tables.

The breakthrough came 21 moves into one of the 100 games between the two machines: Stockfish 8, one of the world’s leading computer chess engines; and AlphaZero, an upstart program developed by DeepMind, Google’s artificial intelligence research arm. Stockfish, playing black, had a strong defensive position, and had a clear advantage in terms of “material”. It was at this point that white made what looked like a wild sally. It sent a bishop deep into enemy territory, right into the clutches of a black pawn.

To Jonathan Schaeffer this makes no sense at all. A professor of computer science at the University of Alberta, he has spent much of his career designing game-playing machines. He became a chess master himself and wrote what was briefly the top chess program in the 1980s. He also went on to write the first draughts program to beat a human world champion.

Sacrificing a high-value piece without an obvious gain looks like a classic mistake. “By and large, games are decided by counting up material,” he says. “If I’m a pawn ahead, I’m going to win.”

But something unexpected happened after the white bishop charged recklessly into combat. It drew black out, and the board began to open up. The white move resulted, as Schaeffer puts it, in “enormous piece activity, and black is tied up in knots”.

No top chess player would take such a big risk, he says. But this computer seems to have “such control over the board, it’s almost as though it has an intuition something good will happen”. His verdict on its overall game-playing ability: “It’s incredible. It’s hard for me to get my head around it.”

The professor is not alone. Stuart Russell, an expert in AI at the University of California, Berkeley, calls the new computer’s performance “very impressive indeed. It’s certainly going to upset the computer chess world.”

But the result of this battle of silicon brains goes well beyond its impact on chess and its insight into how AI could be more practical for tackling everyday problems — it may even mark the moment when a truly flexible computer was born.

These days, it feels as though we are always on the brink of another AI breakthrough. Each one is heralded as the next step in the chipping away of human exceptionalism, inexorably reducing the gulf between man and machine. Despite the hype, most are of limited value. Computers — even smart ones — are good at doing the one thing they’re programmed for. If they had to apply their one clever trick to a different type of problem, they would fall flat on their face. Just occasionally, though, the machine does the unexpected.

My personal involvement with chess began and ended very early in life. But even I can see the apparent folly of sacrificing a bishop for a pawn when it doesn’t lead to an obvious chance either to capture a piece of at least equal value or mount a crushing attack. What makes this all the more startling is that the computer playing white was a newcomer to the game. AlphaZero is the latest in a series of game-playing systems built by DeepMind; its predecessor, AlphaGo, beat the world’s top Go player two years ago.

AlphaZero taught itself chess from scratch in just four hours, playing games against itself, learning and rejecting openings and endgames that humans have developed over the course of centuries. Starting with only the basic rules of chess, the system was free to make up its own strategy, unconstrained by anything that came before. The software uses a technique called reinforcement learning to understand which moves are most likely to be successful. It makes a move and then plays out all the possible combinations of moves that follow.

It’s like Pavlov’s dog, says Schaeffer in a slight stretch of the famous analogy: If the move was more likely than not to lead to a good outcome, it gets a reward. To me if feels very much like the computer in the 1983 movie WarGames, which taught itself the futility of nuclear war after playing itself at tic-tac-toe and discovering there was no way to win. Of course, with only nine squares, it doesn’t take long to work out all the possible permutations on the tic-tac-toe grid; in chess, there are many billions of possible combinations.

Strikingly, AlphaZero wasn’t designed to be a pure chess computer. Using a deep neural network — an approach modelled on a theory about how the human brain filters information to find patterns — it also taught itself to play Go in eight hours and Shogi, a Japanese board game, in two, beating the top software in those fields as well. Has its lack of knowledge about human chess history — the very thing that is used to fine-tune other chess programs, including Stockfish — enabled AlphaZero to see the game in a fresh way?

“There’s a lot of human bias built into every other chess program,” says Schaeffer. The DeepMind machine was not constrained in the same way. How else to account for its superiority? “I’m absolutely convinced it’s because it hasn’t learned from humans,” he says.

So does that mean that something akin to an alien intelligence has emerged, and do mere mortals need to rethink everything they thought they knew about chess? Not quite. Human champions have played this way before. But a defensive game has taken hold in recent years, particularly since computer chess took over as the focus of the most advanced play.

In chess circles, AlphaZero has been compared to Paul Morphy, an American chess prodigy from the 19th century, when a more swashbuckling style of play was in vogue. Schaeffer also draws a comparison with Tigran Petrosian, a Soviet champion from the 1960s who played an attack-on-all-fronts game: “He was like a python — he would slowly squeeze you.”

Another is Garry Kasparov. When I catch up with the Russian grandmaster to seek his view on the self-taught chess prodigy, he seems to recognise a kindred spirit. “AlphaZero, to my joy, sacrifices,” he says, equating its style of play to his own. But rather than the chess, he seems more eager to talk about the AI at work behind the scenes. This is the first example of a new class of computer, he says: “It is the first step in creating real AI.”

Kasparov knows a thing or two about AI. He was, after all, the first world champion chess player to lose to a computer — and that personal brush with a superior silicon intellect still seems to sting. Kasparov was never one to bow to convention. After becoming the youngest world chess champion in 1985, at the age of 22, he went on to dominate the game for two decades. He was known for a highly dynamic style of play, particularly his aggressive openings designed to throw competitors off-balance from the start. But in one of the defining contests in the history of man against machine, he was beaten by IBM’s Deep Blue two decades ago.

Maybe it is tactless to bring the subject up. And indeed, when I mention Deep Blue, Kasparov gets defensive. He is quick to point out that he defeated the machine in their first encounter, and also took the first game the second time they met, the following year. But he ended up losing by two games to one, with three draws. That turned out to be game over for humanity in a field so long seen as a marker of our strategic genius. Computer chess programs have continued to widen the gap ever since.

Besides being pleasantly struck by the similarities he sees between AlphaZero’s game and his own, Kasparov suggests there have been some surprises from watching the software play. It’s well known, for instance, that the person who plays white, and who moves first, has an edge. But Kasparov says that AlphaZero’s victory over Stockfish has shown that the scale of that starting advantage is actually far greater than anyone had realised. It won 50 per cent of the games when it played white, compared to only 6 per cent when it played black. (The rest of the games were draws.)

Kasparov is cautious about predicting that AlphaZero has significant new chess lessons to teach, although he concedes it might encourage some players to try “a more dynamic game”. But if he seems only mildly interested in the quality of the chess, he is more forthright in his admiration for the technology. Kasparov has studied AI and written a book on it. AlphaZero, he says, is “the prototype of a flexible machine”, the kind that was dreamed of at the dawn of the computer age by two of the field’s visionaries, Alan Turing and Claude Shannon.

All computers before this, as he describes it, worked by brute force, using the intellectual equivalent of a steamroller to crack a nut. People don’t operate that way: “Humans are flexible because we know that sometimes we have to depart from the rules,” he says. In AlphaZero, he thinks he has seen the first computer in history to learn that very human trick.

At this point we should take a step back. No, we aren’t at the point where computers are about to achieve a level of general intelligence to match — and then overtake — humanity. Predictions about the imminent rise of the machines have always turned out to be wildly over-optimistic. Herbert Simon, one of the pioneers of AI, forecast in 1965 that computers would be able to do any work a human was capable of within 20 years. When today’s experts in the field were asked when that moment would come, only half picked a time within the next 30 years.

But sometimes, individual steps on the way to this still-distant future come sooner than expected. Schaeffer, for one, says he didn’t think that the computing problem AlphaZero appears to have solved would be cracked for another decade.

All computers use large-scale number-crunching to calculate their results. What they lack in the kind of intuition that seems to guide human intelligence, they make up for in raw processing power. But when presented with a choice that could lead to billions of potential outcomes — like looking at possible moves on a chess board — how do they decide which are the most likely to yield a correct result, and should be tested first? And, perhaps even more importantly, how do they know when they have come up with a good enough answer and it’s time to stop calculating endless alternative outcomes?

Stuart Russell, who has studied problems similar to the one tackled by AlphaZero, says that all chess programs since the 1940s have used the same basic technique, and that the DeepMind software is no different. As they try to anticipate the outcome of a particular move, the paths the game could follow open out before them, like the branches and twigs of a tree. The human brain doesn’t work that way. A human player, sizing up the same situation on the board, works from the desired result backwards. Russell says the thought process runs something like this: “I bet I can trap his queen, let me think of a move to do it.”

AlphaZero may be limited, like all chess computers, to looking at the problem the other way around. But it has developed its own form of intuition to help refine the process. To beat Stockfish, it learnt how to narrow down the number of promising moves to examine. Rather than look at the whole tree, it worked out which branch to focus on. That may not come close to matching the feat the human brain manages when faced with the same problem. But examining only what seem to be the most promising options is “arguably a more ‘human-like’ approach to search” than the way machines normally process problems, says Russell.

This is borne out by the lower number of calculations AlphaZero carried out compared to Stockfish. Each time it studied a move, it searched through 80,000 positions a second, far fewer than the 70m positions a second searched by the rival — losing — program. When transferred to the real world, however, the gulf between AI and the human brain looms large again. Chess, says Russell, has “known rules and short horizons”, and it is “fully observable, discrete, deterministic, static”. The real world, by contrast, “shares exactly none of these characteristics”.

There is, for instance, the question of when the software knows it has come up with the best answer, without first examining every possible result available, no matter how irrelevant.

As Kasparov puts it: “Machines aren’t capable of stopping when they get to the point of diminishing returns.” At the AI conferences he has attended, he says, there is a perennial topic that draws some of the biggest audiences: how to get the machines to stop.

Another real-world conundrum is how to use this method to teach a machine about the implications of its decisions. In many cases, there simply isn’t enough data available in a form for computers to learn from.

“You’re constrained in the amount of learning you can do in the physical world,” says Schaeffer. To teach an autonomous car to drive across the country, for instance, it would be useful to have millions of examples of similar drives that have been done before, but those “training sets” of data don’t exist.

And the more complex the situation, the less clear the link between an action and its result. Reinforcement learning only works when one gets an instant reward for doing the right thing.

If you set a machine a complex real-world task, it might never even work out where to start, says Russell. Searching for a first, useful action, it could “try all kinds of things: scrambling eggs, stacking wooden blocks, chewing wires, poking its fingers into electrical outlets”. Nothing would produce a strong enough feedback loop to convince the computer it was on the right track and lead it to the next necessary action.

This kind of software would never learn by itself how to complete a complex task in the real world, he says, “if we waited through the lifetimes of a billion universes and used a computing engine the size of the galaxy”.

There are many human decisions, though, that may be far more like board games than this suggests. In those situations, a more flexible machine, with a way of channelling its computational power towards solving the most promising problems first, may have a chance.

Those who believe they have seen something new in AlphaZero say it is premature to draw boundaries around the software’s potential. It is like waiting for a child to grow up. The technology has shown a surprising precociousness: now it’s a question of seeing if it can live up to the promise.

“The biggest question,” says Kasparov, “is will it adapt? Will it learn?” He, for one, cautions against the view that, like most other AI systems, it will turn out to be a one-trick wonder.

Eventually, no doubt, it will go down as just one more step on the long road to a true artificial intelligence. As Russell says: it will take entirely new and undreamt-of computing techniques for the machines to operate freely in the kind of open-ended, real-world situations that are second nature for humans.

But for now, by showing it can teach itself to solve diverse problems with no human intervention, it has done something no AI has done before. Checkmate to the computers.

Man v machine moments

As America industrialised in the 19th century, a star railroad worker named John Henry is said to have started a rock-drilling contest with one of the railroad’s new steam-powered drills, writes Kitty Grady. Henry won — but died shortly afterwards from heart failure, becoming a folkloric legend.

In November 2006, Scrabble champion David Boys took on computer program Quackle. The program won 482–465, using words including “qadi” and “anuria”. Boys was philosophical: “It’s still better to be a human than to be a computer.”

In February 2011, world bowling champion Chris Barnes played against EARL (the Enhanced Automated Robotic Launcher). Despite being able to perform the same perfect shot repeatedly, EARL lost 259–209, after failing to adjust to new conditions when its ball wore down the oil on the lane.

In the same year, IBM supercomputer Watson appeared on the game show Jeopardy!, competing against two of its most successful players. With a 15-terabyte data bank that can perform 80tn operations a second, Watson won, taking $1m in prize money.

In January 2017, four of the world’s highest-ranking poker players competed against AI program Libratus in a 20-day tournament of no-limit Texas Hold’em. Libratus won outright, amassing almost $2m in chips.



View all >
Enterprise Experience 2019