(These are excerpts from my book "Intelligence is not Artificial")
The Truth about AlphaZero
AlphaGo of 2016 was trained (over several months) by supervised learning from human expert moves and by reinforcement learning from self-play. In 2017 DeepMind's new system, AlphaGo Zero, learned the rules of weiqi/go in three hours by simply playing against itself ("Mastering the Game of Go without Human Knowledge", 2017). After 40 days of training it became able to beat any version of the data-trained AlphaGo. Also impressive was the fact that AlphaGo Zero of 2017 consumed less power than AlphaGo of 2016: it ran on a single machine with 4 Tensor Processing Units (TPUs) whereas AlphaGo of 2016 was distributed over many processors for a total of 48 TPUs. Whereas previous editions of Alpha used two neural networks for value and policy, AlphaGo Zero combined policy and value into just one neural network, a network made of a stack of residual modules with ReLu non-linearity. This neural network knew nothing about Go. The real "intelligence" was embedded in the search algorithm, that was the real self-playing actor (or, better, algorithm): it was yet another variant of the Monte-Carlo tree search using Howard's policy-iteration method of 1960. The neural network acted as this algorithm's memory of what it had learned. More importantly, it looked like AlphaGo Zero had in a few days not only learned the rules of weiqi/go but also independently rediscovered thousands of years of human knowledge about playing weiqi/go as well as discovered new knowledge (creative novel strategies that had never been used by human masters).
It worked because weiqi/go is a fully deterministic world, because each player has complete information about the state of the game (of the world), because the number of possible moves (of possible actions in the world) is finite, and because the effects on the world of any action can be predicted exactly. There is a huge dataset of games played by human masters, each game is relatively short (about 200 moves), and, last but not least, it is a case in which we can afford to lose thousands of times before the system starts doing the right thing (winning games). All of these conditions are quite rare in the real world. When it has to lift an object, a robot works in a world that is not deterministic, where the number of possible moves is infinite, where the full effect on the world of an action is unpredictable, where a simple torque may require thousands of adjustments, and where a mistake can cost millions of dollars. We can build a huge dataset of how humans grasp the object, but we have to do it for each object and probably each person uses a slightly different movement of the arm, the hand and the fingers.
The game of weiqi/go is a case in which the Markov assumption holds: the current state is all you need to know in order to determine the best next move. You don't care how you got to the current state. However, the real world is mostly a very different kind of game, in which the current state is influenced by all sorts of factors. A game like football is much harder than weiqi/go, and almost any transaction in society is much harder than football. AlphaGo is a toy, just a toy. It will become a serious challenger to human intelligence only if we turn our own world into a toy-kind of world, by removing all the complexity and leaving only Markov conditions. If, in our obsessed determination to structure our world, we manage to turn our daily lives into the equivalent of a weiqi/go game, then AlphaGo will surpass human intelligence (except that i am not sure i would still call it "human" intelligence).
For the record, as of 2017, AlphaGo can only play go on a standard 19 x 19 board: change the size of the board, and AlphaGo doesn't know how to play go anymore. That is the problem of all algorithms that don't learn the rules but simply try to mimic human behavior: a parrot can only repeat a few words but cannot have a conversation, for the simple reason that it has no clue what a conversation is.
AlphaGo Zero is obviously not a case of artificial general intelligence but of very narrow intelligence: just like AlphaGo, it can only do one thing. However, it is an interesting case because it shows that in some cases a machine can learn to perform a task not only better than humans but in ways that are different from the way humans think. Whether this is different from what a clock or a TV set do is open to debate. A clock keeps time better than any human can: is it a narrow artificial intelligence? A TV set does something that no human can do: it broadcasts images. In fact, any commercially available program performs a task better than most if not all humans: what is truly different between AlphaGo Zero playing weiqi better than humans and your tax preparation software? The software engineer will reply: AlphaGo Zero learned by itself to perform its task instead of being programmed to perform that task. But that is playing with words: a software engineer architected AlphaGo, i.e. wrote software for AlphaGo Zero to do whatever it did to learn to play weiqi. A better answer is that AlphaGo Zero keeps improving itself at what it does, but that's what any reinforcement-learning system does, as well as other kinds of learning systems (e.g. evolutionary algorithms); and, honestly, there are millions of programs that improve themselves over time: any program that interacts with the environment can absorb information from the environment and use that information to improve its interaction with the environment (the recommendation algorithms of commercial websites like Amazon are a typical example). The "environment" in AlphaGo Zero is the rules of the game, which is actually a very simplified environment compared with the real world. The final answer is that AlphaGo Zero is an experiment in new ways to program machines to perform narrow tasks better than humans. For tax preparation a traditional program is much preferred. For weiqi, AlphaGo Zero is preferred.
If AlphaGo took a normal I.Q. test, it would get a zero: it can only do one thing, and cannot even understand the questions (Even the very lenient I.Q. test carried out in 2017 by the Chinese Academy of Sciences, that didn't require the machine to understand the questions, found that AlphaGo had an I.Q. of 48).
No wonder that AlphaGo plays go better than any human being: it has nothing else to do. AlphaGo is spending its entire life simply playing that game against itself. No human being is so stupid to spend her or his life playing at the same game against herself or himself and doing nothing else. We have better things to do. Even the most obsessed player is busy doing many things, from buying food to reading books like this one. The brain of that player is doing thousands of different things, and many of them at the same time (like when you drive in heavy traffic listening to the radio and cursing at the driver who just cut in front of you).
AlphaGo can only play weiqi, and it keeps playing it all the time, day and night, against itself. That's how it gets better and better. There is no name for a person who can only do one thing, and keeps doing it better and better because it does it all the time, nonstop, 24 hours a day, because no human being has ever exhibited that kind of neurological disease. We would certainly deliver such a person to a psychiatric hospital.
This limitation to narrow intelligence did not escape AlphaGo Zero's creators who immediately set out to create a more general program. In less than two months (end of 2017) DeepMind readied a new program ("Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm", 2017) called AlphaZero that, using the same tactic as AlphaGo Zero, had learned to play other games, including chess. AlphaZero learned chess in four hours and DeepMind announced that it had crushed Stockfish. Stockfish is one of the most popular "chess engines", developed by Marco Costalba in 2008 in Italy as an evolution of Tord Romstad's Glaurung (Norway, 2004), maintained by an open-source community, and not using neural networks. The training of AlphaZero took nine hours on 5000 TPUs (Google's specialized processor). AlphaZero was not the first program to beat Stockfish: Komodo, first developed in 2010 by Don Dailey and Larry Kaufman, had done it a few months earlier; and Houdini, developed by Robert Houdart in Belgium in 2010, won that year's TCEC (Top Chess Engine Competition), the world championship of chess engines; not Stockfish (that ended third even behind Komodo). The difference is that Komodo and Houdini played fair and square, whereas AlphaZero played against a defanged version of Stockfish: the version of Stockfish used was not the most recent one, Stockfish was not allowed to access an opening book (it is optimized for that scenario), its hashtable memory was limited to 1 Gigabyte (it requires a lot more when running on 64 cores), and the games were played at a fixed time of 1 minute per move (Stockfish is designed to optimize time management). Furthermore, AlphaZero ran on 4 TPUs whereas Stockfish ran on 64 CPU cores. Google's TPU of 2017 boasted a performance of 180 TFLOPS, dozens of times more than the performance of 64 cores. It was certainly an impressive demonstration of machine learning, but all chess positions with seven pieces or less had been mathematically solved already in 2012 and using significantly less computing power (Convekta's Lomonosov Tablebases, the first complete seven-piece endgame database). And, of course, AlphaZero did not learn solely from self-play: someone had to tell AlphaZero the rules of the game.
In December 2018 DeepMind finally published the paper on AlphaZero (credited to David Silver, Thomas Hubert and Julian Schrittwieser). However, as it is often the case with AI, we are playing with words. The paper stated clearly: "We trained separate instances of AlphaZero for chess, shogi, and go"; i.e. each instance of AlphaZero learned to play either or shogi or go, and then, yes, each instance became the best at the one and only game it masteredd. The paper, however, closed with the vision of "a general game-playing system that can learn to master any game", an ambiguous sentence that led readers (and the media) believe that AlphaZero could learn all these games at the same time. We dumb humans can learn an unlimited number of games without having to forget the ones we already learned: one human brain learns an unlimited number of things. AlphaZero, instead, can learn to play chess only if it forgets how to play go. Sure, you can have three AlphaZeros running in parallel, one playing chess, one playing go and one playing shogi. Put them into the same machine and you can claim that the machine has learned to play all three games, and it is a world champion in all three. But that's conceptually very different from saying that we have one artificial brain playing three different games: what we really have is three artificial brains, each capable of playing only one game. It's the difference between saying "my kitchen can cook, wash the dishes, and even expel the smell" instead of saying "my kitchen has an oven, a dishwasher and a fan". The kitchen as a whole is certainly better than me at each of those tasks, but its working is a far cry from what the human brain does.
The media were shocked to read that AlphaZero took only 4 hours to learn what humans had learned in 100 years, but the difference between 4 hours and 40 hours or 4,000 hours or 4 million hours is simply the speed of the processor. The question is whether it did learn by itself, or did not, all that knowledge. Whether it did it in one second or in one year is relevant for winning the game, but doesn't make a difference in the achievement. To wit, whether Einstein came up with Relativity in one day or one year makes no difference to the achievement. And there are plenty of machines that can do things a lot faster than me. I cannot run as fast as my car. It is nothing new that a machine can be faster than a human, and faster than previous generations of that machine.
It is important to remember that these are games. There is a finite board, there are only so many things that can happen, and a move has a limited number of effects. The real world is different from a board game. The board of the real world is infinitely bigger than a weiqi or chess board. The number of things that can happen to you in the real world is virtually infinite. And also virtually infinite can be the effects of your actions. A friend who studied law always reminds me the story of the man who throws a cigarette butt on the floor. What if the wind blows it on dry vegetation and the vegetation catches fire? And what if that fire causes a car to hit a post? And what if a wheel of that car rolls down the hill and hits and kills a child? And what if the mother of that child runs across the street without paying attention to the coming truck and gets killed too and in the process the truck swerves violently and crashes into a house killing six people? And so on and on. In the real world AlphaZero would have to learn an infinite number of rules, and learning them by "reinforcement" may just be impossible: reinforcement learning works when there is an immediate reward or punishment. Without any (real) intelligence, AlphaZero would have to try an infinite number of actions in order to find out how to do laundry: ring the bell, turn on the TV, read page 145 of this book, jump from the roof, and so on. Its algorithm would eventually find a reward for one specific action: put the clothes in the washing machine, close the lid, and press the start button. And even this action would have to be repeated successfully many times, with the proper sequence of pressing buttons, before AlphaZero accepts it as the correct operation to wash clothes. Even brute-force A.I. would require an immensely powerful computer for AlphaZero to carry out this simple operation that any human intelligence can perform in a few seconds.
Before you surrender to AlphaZero and its likely descendants, think how long it took you to learn how to play a game: three times? seven times? How long did it take you to become a good player? One hundred times? Now remember that these DeepMind algorithms had to play the game millions of times to learn what you learned in a few attempts. And in the meantime you probably learned also a lot of other things.
To assess AlphaZero's achievement, we have to decide (yet again), what we are looking for: are we looking for a tool that does something better than humans do or are we looking for human-like intelligence? Is AlphaZero better than humans at what it does? Sure, just like an airplane can fly and i cannot, and a car can move a lot faster than the fastest man, and a TV set can show the image of something happening far away. These are all examples of super-human achievements achieved by machine that use non-human methods (non-human "intelligence"?) So is a hammer, by the way, and so is a screwdriver. These are tools that can do things that we cannot do. Every machine exists because it can do something better than most humans can. If it cannot, then it's a toy for children.
Is AlphaZero a case of (human) intelligence? No, of course not. We humans don't need to play millions of games to master a new game. We humans need to play a few times and we treasure advice from fellow humans. We humans are very different from AlphaZero. Can AlphaZero do something a lot better and faster than us? Sure, just like every computer program, just like every car, just like every machine in an assembly line.
Perhaps the most stunning fact about AlphaGo and its descendants is that they are based on old A.I. techniques. The ideas behind AlphaZero are so simple that, within one month, David Foster, co-founder of Applied Data Science, published instructions on "How to build your own AlphaZero A.I. using Python and Keras".
DeepMind's A.I. systems are certainly impressive research programs, but one has to put them in perspective. When you marvel at the power of AlphaGo or AlphaZero, ask yourself: "how many lives have DeepMind's A.I. systems saved so far?" Or, if you are the Wall Street kind of person: "what kind of economic revolution have DeepMind's A.I. systems caused so far?" To put things in perspective, exactly 100 years before DeepMind was founded, two German chemists, Fritz Haber and Carl Bosch, invented a process to produce the fertilizer ammonia: that humble invention caused a revolution in agriculture that fed billions of people and created a vast global economy.
As of 2018, AlphaGo has not been made available to the weiqi/go community. Ditto for AlphaGo Zero and AlphaZero.
Back to the Table of Contents
Purchase "Intelligence is not Artificial")
|