(These are excerpts from my book "Intelligence is not Artificial")
Back Propagation - A brief History of Artificial Intelligence/ Part 2
Knowledge-based systems did not expand as expected: the human experts were not terribly excited at the idea of helping construct digital clones of themselves, and, in any case, the clones were not terribly reliable.
Expert systems also failed because of the World-wide Web: you don't need an expert system when thousands of human experts post the answer to all possible questions. All you need is a good search engine. That search engine plus those millions of items of information posted (free of charge) by thousands of people around the world do the job that the "expert system" was supposed to do. The expert system was a highly intellectual exercise in representing knowledge and in reasoning heuristically. The Web is a much bigger knowledge base than any
expert-system designer ever dreamed of. The search engine has no pretense of
sophisticated logic but, thanks to the speed of today's computers and networks,
it "will" find the answer on the Web. Within the world of computer
programs, the search engine is a brute that can do the job once reserved to
Note that the apparent "intelligence" of the Web (its ability to provide all sorts of questions) arises from the "non-intelligent" contributions of thousands of people in a way very similar to how the intelligence of an ant colony emerges from the non-intelligent contributions of thousands of ants.
In retrospect a lot of sophisticated logic-based software had to do with slow and expensive machines. As machines get cheaper and faster and smaller, we don't need sophisticated logic anymore: we can just use fairly dumb techniques to achieve the same goals. As an analogy, imagine if cars, drivers and gasoline were very cheap and goods were provided for free by millions of people: it would be pointless to try and figure out the best way to deliver a good to a destination because one could simply ship many of those goods via many drivers with an excellent chance that at least one good would be delivered on time at the right address. The
route planning and the skilled knowledgeable driver would become useless, which
is precisely what has happened in many fields of expertise in the consumer
society: when is the last time you used a cobbler or a watch repairman?
The motivation to come up with creative ideas for A.I. scientists was due to slow, big and expensive machines. Now that machines are fast, small and cheap the motivation to come up with creative ideas is much reduced. Now the real motivation for A.I. scientists is to have access to thousands of parallel processors and let them run for months. Creativity has shifted to coordinating those processors so that they will search through billions of items of information. The machine intelligence required in the world of cheap computers has become less of a logical intelligence and more of a “logistical” intelligence.
The 1980s also witnessed a progressive rehabilitation of neural networks, a process that turned exponential in the 2000s.
One important center of research was located in southern California. In 1976 the cognitive psychologists Don Norman and David Rumelhart (two members of the LNR research group) founded the Institute for Cognitive Science at UCSD and hired a fresh British graduate, Geoffrey Hinton, who also happened to be the great-great-grandson of the founder of binary logic, George Boole. Soon, UC San Diego became a hotbed of research in neural networks. In early 1982, inspired by Raj Reddy's Hearsay project at Carnegie Mellon University, Rumelhart, Hinton, James McClelland, Paul Smolensky, and the biologists David Zipser and Francis Crick (of DNA fame) formed the PDP (Parallel Distributed Processing) research group of psychologists and computer scientists. After six months Hinton, the original organizer, moved to Carnegie Mellon University (where he organized a summer workshop that introduced him to the ideas of Terrence Sejnowski on the Boltzmann machine) and McClelland to MIT. Soon the two were reunited at Carnegie Mellon where a second PDP group was still spawned. The San Diego group would go on to include David Zipser's student Ronald Williams, Michael Jordan and Jeffrey Elman.
Neural networks were rescued in 1982 by the CalTech physicist John Hopfield, who described a new generation of neural networks ("Neural Networks and Physical Systems with Emergent Collective Computational Abilities", 1982). Hopfield designed a network in which all connections are symmetric, i.e. all neurons are both input and output neurons. It is a "recurrent" network because the effect of a neuron's computation ends up flowing back to that neuron. Until then the most popular architecture had been the "feedforward" kind: the output of a layer of neurons does not affect the same layer but only the layers that are downstream. Feedback networks, instead, can have all sorts of upstream repercussions. Recurrent networks can be very difficult to analyze, but Hopfield's networks have symmetric connections and the neurons are binary neurons,
and in that case the network dynamics can be described with what physicists call an "energy function": one can measure the "energy" of each state of the network has an energy, and training the network is equivalent to lowering the energy. The network has learned something when it reaches an energy minimum.
The memory of something is an energy minimum of this neural net.
These neural networks were immune to Minsky's critique. Hopfield's key intuition was to note the similarity with statistical mechanics. Statistical mechanics translates the laws of Thermodynamics into statistical properties of large sets of particles. The fundamental tool of statistical mechanics (and soon of this new generation of neural networks) is the Boltzmann distribution (actually discovered by Josiah-Willard Gibbs in 1901), a method to calculate the probability that a physical system is in a specified state.
Dana Ballard at the University of Rochester predated deep belief networks and stacked autoencoders by 20 years when he used unsupervised learning to build representations layer by layer ("Modular Learning in Neural Networks", 1987).
In the same year Teuvo Kohonen popularized the "self-organising map" (SOM), soon to become the most popular algorithm for unsupervised learning ("Self-organized Formation of Topologically Correct Feature Maps," 1982), borrowing the architecture used by Christoph von der Malsburg in Germany to simulate the visual cortex ("Self-organization of Orientation Sensitive Cells in the Striate Cortex", 1973).
Meanwhile, in 1974 Paul Werbos' dissertation at Harvard had worked out a more efficient way to train a neural network: the "backpropagation" algorithm.
His discovery languished for several years because his background wasn't quite orthodox: his thesis advisor was the social scientist and cybernetic pioner Karl Deutsch, and his algorithm of backpropagation was meant as a mathematical expression of the concept of "cathexis" that Sigmund Freud had introduced in his book "The Project for a Scientific Psychology" (1895).
Back to the Table of Contents
Purchase "Intelligence is not Artificial")