Perceptrons (book)
Author | ISBN 0 262 13043 2 | |
---|
Perceptrons: an introduction to computational geometry is a book written by
The main subject of the book is the
This book is the center of a long-standing controversy in the study of artificial intelligence. It is claimed that pessimistic predictions made by the authors were responsible for a change in the direction of research in AI, concentrating efforts on so-called "symbolic" systems, a line of research that petered out and contributed to the so-called AI winter of the 1980s, when AI's promise was not realized.[4]
The crux of Perceptrons is a number of
Publication history
When Papert arrived at MIT in 1963, Minsky and Papert decided to write a theoretical account on the limitations of perceptrons. It took until 1969 for them to finish solving the mathematical problems that unexpectedly turned up as they wrote. The first edition was printed in 1969. Handwritten alterations were made by the authors for the second printing in 1972. The handwritten notes include some references to the review for the first edition.[7][8][9]
An "expanded edition" was published in 1988, which adds a prologue and an epilogue to discuss the revival of neural networks in the 1980s, but no new scientific results.[10] In 2017, the expanded edition was re-printed, with a foreword by Léon Bottou that discusses the book from the perspective of someone working in deep learning.
Background
The
During this period, neural net research was a major approach to the brain-machine issue that had been taken by a significant number of individuals.[12] Reports by the New York Times and statements by Rosenblatt claimed that neural nets would soon be able to see images, beat humans at chess, and reproduce.[3] At the same time, new approaches including symbolic AI emerged.[13] Different groups found themselves competing for funding and people, and their demand for computing power far outpaced available supply.[14]
Contents
Perceptrons: An Introduction to Computational Geometry is a book of thirteen chapters grouped into three sections. Chapters 1–10 present the authors' perceptron theory through proofs, Chapter 11 involves learning, Chapter 12 treats linear separation problems, and Chapter 13 discusses some of the authors' thoughts on simple and multilayer perceptrons and pattern recognition.[15][16]
Definition of perceptron
Minsky and Papert took as their subject the abstract versions of a class of learning devices which they called perceptrons, "in recognition of the pioneer work of Frank Rosenblatt".[16] These perceptrons were modified forms of the perceptrons introduced by Rosenblatt in 1958. They consisted of a retina, a single layer of input functions and a single output.[15][12]
Besides this, the authors restricted the "order", or maximum number of incoming connections, of their perceptrons. Sociologist Mikel Olazaran explains that Minsky and Papert "maintained that the interest of neural computing came from the fact that it was a parallel combination of local information", which, in order to be effective, had to be a simple computation. To the authors, this implied that "each association unit could receive connections only from a small part of the input area".[12] Minsky and Papert called this concept "conjunctive localness".[16]
Parity and connectedness
Two main examples analyzed by the authors were parity and connectedness. Parity involves determining whether the number of activated inputs in the input retina is odd or even, and connectedness refers to the figure-ground problem. Minsky and Papert proved that the single-layer perceptron could not compute parity under the condition of conjunctive localness (Theorem 3.1.1), and showed that the order required for a perceptron to compute connectivity grew with the input size (Theorem 5.5).[17][16]
The XOR affair
Some critics of the book [
There are many mistakes in this story[
In the 1960s, a special case of the perceptron network is studied as "linear threshold logic", for applications in digital logic circuits.[22] The classical theory is summarized in [23] according to Donald Knuth.[24] In this special case, perceptron learning was called "Single-Threshold-Element Synthesis by Iteration", and constructing a perceptron network was "Network Synthesis".[25] Other names included linearly separable logic, linear-input logic, threshold logic, majority logic, and voting logic. Hardware for realizing linear threshold logic included magnetic core, resistor-transistor, parametron, resistor-tunnel diode, and multiple coil relay.[26] There were also theoretical studies on the upper and lower bounds on the minimum number of perceptron units necessary to realize any Boolean function.[27][28]
What the book does prove is that in three-layered feed-forward perceptrons (with a so-called "hidden" or "intermediary" layer), it is not possible to compute some predicates unless at least one of the neurons in the first layer of neurons (the "intermediary" layer) is connected with a non-null weight to each and every input (Theorem 3.1.1, reproduced below). This was contrary to a hope held by some researchers [citation needed] in relying mostly on networks with a few layers of "local" neurons, each one connected only to a small number of inputs. A feed-forward machine with "local" neurons is much easier to build and use than a larger, fully connected neural network, so researchers at the time concentrated on these instead of on more complicated models[citation needed].
Some other critics, notably Jordan Pollack, note that what was a small proof concerning a global issue (parity) not being detectable by local detectors was interpreted by the community as a rather successful attempt to bury the whole idea.[29]
Critique of perceptrons and their extensions
In the prologue and the epilogue, added to the 1988 edition, the authors react to the 1980s revival of neural networks, by discussing multilayer neural nets and Gamba perceptrons.[30][31][32][33] By "Gamba perceptrons", they meant two-layered perceptron machines where the first layer is also made of perceptron units ("Gamba-masks"). In contrast, most of the book discusses two-layered perceptrons where the first layer is made of boolean units. They conjecture that Gamba machines would require "an enormous number" of Gamba-masks and that multilayer neural nets are a "sterile" extension. Additionally, they note that many of the "impossible" problems for perceptrons had already been solved using other methods.[16]
The Gamba perceptron machine was similar to the perceptron machine of Rosenblatt. Its input were images. The image is passed through binary masks (randomly generated) in parallel. Behind each mask is a photoreceiver that fires if the input, after masking, is bright enough. The second layer is made of standard perceptron units.
They claimed that perceptron research waned in the 1970s not because of their book, but because of inherent problems: no perceptron learning machines could perform credit assignment any better than Rosenblatt's perceptron learning rule, and perceptrons cannot represent the knowledge required for solving certain problems.[29]
In the final chapter, they claimed that for the 1980s neural networks, "little of significance [has] changed since 1969". They predicted that any single, homogeneous machine must fail to scale up. Neural networks trained by
Mathematical content
Preliminary definitions
Let be a finite set. A predicate on is a boolean function that takes in a subset of and outputs either or . In particular, a perceptron unit is a predicate.
A predicate has support , iff any , we have . In words, it means that if we know how works on subsets of , then we know how it works on subsets of all of .
A predicate can have many different supports. The support size of a predicate is the minimal number of elements necessary in its support. For example, the constant-0 and constant-1 functions both are supported on the empty set, thus they both have support size 0.
A perceptron (the kind studied by Minsky and Papert) over is a function of form
If is a set of predicates, then is the set of all perceptrons using just predicates in .
The order of a perceptron is the maximal support size of its component predicates .
The order of a boolean function on is the minimal order possible for a perceptron implementing the boolean function.
A boolean function is conjunctively local iff its order does not increase to infinity as increases to infinity.
The mask of is the predicate defined by
Main theorems
Theorem 1.5.1, Positive Normal Form — If a perceptron is of order , then it is of order using only masks.
Let the perceptron be , where each is of support size at most . We convert it into a linear sum of masks, each having size at most .
Let be supported on set . Write it in disjunctive normal form, with one clause for each subset of on which returns , and for each subset, write one positive literal for each element in the subset, and one negative literal otherwise.
For example, suppose is supported on , and is on all odd-sized subsets, then we can write it as
Now, convert this formula to a Boolean algebra formula, then expand, yielding a linear sum of masks. For example, the above formula is converted to
Repeat this for each predicate used in the perceptron, and sum them up, we obtain an equivalent perceptron using just masks.
Let be the permutation group on the elements of , and be a subgroup of .
We say that a predicate is -invariant iff for any . That is, any , we have .
For example, the parity function is -invariant, since any permutation of the set preserves the size, and thus parity, of any of its subsets.
Theorem 2.3, group invariance theorem — If is closed under action by , and is -invariant, there exists a perceptron
The proof idea is to take the average over all elements of .
Enumerate the predicates in as , and write for the index of the predicate such that , for any . That is, we have defined a group action on the set .
Define . We claim this is the desired perceptron.
Since , there exists some real numbers such that
By definition of -invariance, if , then for all . That is,
Similarly for the case where .
Theorem 3.1.1 — The parity function has order .
Let be the parity function, and be the set of all masks of size . Clearly both and are invariant under all permutations.
Suppose has order , then by the positive normal form theorem, .
By the group invariance theorem, there exists a perceptron
Now we can explicitly calculate the perceptron on any subset .
Since contains subsets of size , we plug in the perceptron’s formula and calculate:
Now, define the polynomial function
Thus, the degree polynomial has at least different roots, one on each , contradiction.
Theorem 5.9 — The only topologically invariant predicates of finite order are functions of the
That is, if is a boolean function that depends on topology can be implemented by a perceptron of order , such that is fixed, and does not grow as grows into a larger and larger rectangle, then is of form , for some function .
Proof: omitted.
Section 5.5, due to David A. Huffman — Let be the rectangle of shape , then as , the connectedness function on has order growing at least as fast as .
Proof sketch: By reducing the parity function to the connectness function, using circuit gadgets. It is in a similar style as the one showing that Sokoban is NP-hard.[34]
Reception and legacy
Perceptrons received a number of positive reviews in the years after publication. In 1969, Stanford professor Michael A. Arbib stated, "[t]his book has been widely hailed as an exciting new chapter in the theory of pattern recognition."[35] Earlier that year, CMU professor Allen Newell composed a review of the book for Science, opening the piece by declaring "[t]his is a great book."[36]
On the other hand, H.D. Block expressed concern at the authors' narrow definition of perceptrons. He argued that they "study a severely limited class of machines from a viewpoint quite alien to Rosenblatt's", and thus the title of the book was "seriously misleading".[15] Contemporary neural net researchers shared some of these objections: Bernard Widrow complained that the authors had defined perceptrons too narrowly, but also said that Minsky and Papert's proofs were "pretty much irrelevant", coming a full decade after Rosenblatt's perceptron.[17]
Perceptrons is often thought to have caused a decline in neural net research in the 1970s and early 1980s.[3][37] During this period, neural net researchers continued smaller projects outside the mainstream, while symbolic AI research saw explosive growth.[38][3]
With the revival of connectionism in the late 80s,
Analysis of the controversy
It is most instructive to learn what Minsky and Papert themselves said in the 1970s as to what was the broader implications of their book. On his website Harvey Cohen,[39] a researcher at the MIT AI Labs 1974+,[40] quotes Minsky and Papert in the 1971 Report of Project MAC, directed at funding agencies, on "Gamba networks":[30] "Virtually nothing is known about the computational capabilities of this latter kind of machine. We believe that it can do little more than can a low order perceptron." In the preceding page Minsky and Papert make clear that "Gamba networks" are networks with hidden layers.
Minsky has compared the book to the fictional book
How Perceptrons was explored first by one group of scientists to drive research in AI in one direction, and then later by a new group in another direction, has been the subject of a sociological study of scientific development.[3]
Notes
- ^ Rosenblatt, Frank (January 1957). The Perceptron: A Perceiving and Recognizing Automaton (Project PARA) (PDF) (Report). Cornell Aeronautical Laboratory, Inc. Report No. 85–460–1. Retrieved 29 December 2019. Memorialized at Joe Pater, Brain Wars: How does the mind work? And why is that so important?, UmassAmherst.
- ^ Crevier 1993
- ^ a b c d e f g Olazaran 1996.
- ISBN 978-0-374-25783-5.
- ^ Minsky-Papert 1972:74 shows the figures in black and white. The cover of the 1972 paperback edition has them printed purple on a red background, and this makes the connectivity even more difficult to discern without the use of a finger or other means to follow the patterns mechanically. This problem is discussed in detail on pp.136ff and indeed involves tracing the boundary.
- JSTOR 1420478.
- .
- ISSN 0036-8075.
- ISSN 0002-9904.
- ^ Grossberg, Stephen. "The expanded edition of Perceptrons (MIT Press, Cambridge, Mass, 1988, 292 pp, $12.50) by Marvin L. Minsky and Seymour A. Papert comes at." AI Magazine 10.2 (1989).
- S2CID 12781225.
- ^ a b c d e Olazaran 1996, p. 618
- ISBN 978-0-262-08153-5.
- arXiv:1803.08971v1 [cs.AI].
- ^ .
- ^ a b c d e Minsky, Marvin; Papert, Seymour (1988). Perceptrons: An Introduction to Computational Geometry. MIT Press.
- ^ a b Olazaran 1996, p. 630
- ^ Theorem 1 in Rosenblatt, F. (1961) Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms, Spartan. Washington DC.
- ISSN 1522-9602.
- S2CID 51640615.
- ^ Cf. Minsky-Papert (1972:232): "... a universal computer could be built entirely out of linear threshold modules. This does not in any sense reduce the theory of computation and programming to the theory of perceptrons."
- ^ Hu, Sze-Tsen. Threshold logic. Vol. 32. Univ of California Press, 1965.
- ISBN 978-0-471-62530-8.
- ISBN 978-0-201-03804-0.
- ^ Dertouzos, Michael L. "Threshold logic: a synthesis approach." (1965).
- ISSN 0367-7508.
- ^ See references within Cover, Thomas M. "Capacity problems for linear machines." Pattern recognition (1968): 283-289.
- S2CID 264603251.
- ^ .
- ^ a b From the name of the Italian neural network researcher Augusto Gamba (1923–1996), designer of the PAPA perceptron. PAPA is acronym for "Programmatore e Analizzatore Probabilistico Automatico" ("Automatic Probabilistic Programmer and Analyzer").
- ISSN 1827-6121.
- ISSN 1827-6121.
- ISSN 1827-6121.
- ISSN 0925-7721.
- .
- JSTOR 1727364.
- arXiv:1803.01164v1 [cs.CV].
1969: Minsky & Papert show the limitations of perceptron's, killing research in neural networks for a decade
- S2CID 170812977.
- ^ "The Perceptron Controversy".
- ^ "Author of MIT AI Memo 338" (PDF).
- ^ "History: The Past". Ucs.louisiana.edu. Retrieved 2013-07-10.
References
- ISBN 1-56881-205-1, pp. 104−107
- ISBN 0-465-02997-3., pp. 102−105
- ISBN 0-13-790395-2p. 22
- Marvin Minsky and Seymour Papert, 1972 (2nd edition with corrections, first edition 1969) Perceptrons: An Introduction to Computational Geometry, The MIT Press, Cambridge MA, ISBN 0-262-63022-2.
- Olazaran, Mikel (1996). "A Sociological Study of the Official History of the Perceptrons Controversy". Social Studies of Science. 26 (3): 611–659. S2CID 16786738.
- Olazaran, Mikel (1993-01-01), Yovits, Marshall C. (ed.), A Sociological History of the Neural Network Controversy, Advances in Computers, vol. 37, Elsevier, pp. 335–425, ISBN 9780120121373, retrieved 2023-10-31