AlexNet

AlexNet is the name of a convolutional neural network (CNN) architecture, designed by Alex Krizhevsky in collaboration with Ilya Sutskever and Geoffrey Hinton, who was Krizhevsky's Ph.D. advisor at the University of Toronto.^[1]^[2]

AlexNet competed in the

ImageNet Large Scale Visual Recognition Challenge on September 30, 2012.^[3] The network achieved a top-5 error of 15.3%, more than 10.8 percentage points lower than that of the runner up. The original paper's primary result was that the depth of the model was essential for its high performance, which was computationally expensive, but made feasible due to the utilization of graphics processing units (GPUs) during training.^[2]

Historic context

AlexNet was not the first fast GPU-implementation of a CNN to win an image recognition contest. A CNN on GPU by K. Chellapilla et al. (2006) was 4 times faster than an equivalent implementation on CPU.

IDSIA was already 60 times faster^[5] and outperformed predecessors in August 2011.^[6] Between May 15, 2011, and September 10, 2012, their CNN won no fewer than four image competitions.^[7]^[8] They also significantly improved on the best performance in the literature for multiple image databases.^[9]

According to the AlexNet paper,

GPU support. In fact, both are actually just variants of the CNN designs introduced by Yann LeCun et al. (1989)^[10]^[11] who applied the backpropagation algorithm to a variant of Kunihiko Fukushima's original CNN architecture called "neocognitron."^[12]^[13] The architecture was later modified by J. Weng's method called max-pooling.^[14]^[8]

In 2015, AlexNet was outperformed by a Microsoft Research Asia project with over 100 layers, which won the ImageNet 2015 contest.^[15]

Network design

AlexNet contains eight layers: the first five are convolutional layers, some of them followed by max-pooling layers, and the last three are fully connected layers. The network, except the last layer, is split into two copies, each run on one GPU.^[2] The entire structure can be written as:

(CNN\to RN\to MP)^{2}\to (CNN^{3}\to MP)\to (FC\to DO)^{2}\to Linear\to softmax

where

CNN = convolutional layer (with ReLU activation)
RN = local response normalization
MP = maxpooling
FC = fully connected layer (with ReLU activation)
Linear = fully connected layer (without activation)
DO = dropout

It used the non-saturating

tanh and sigmoid.^[2]

Influence

AlexNet is considered one of the most influential papers published in computer vision, having spurred many more papers published employing CNNs and GPUs to accelerate deep learning.^[16] As of early 2023, the AlexNet paper has been cited over 120,000 times according to Google Scholar.^[17]

References

^ Gershgorn, Dave (26 July 2017). "The data that transformed AI research—and possibly the world". Quartz.
^
S2CID 195908774
.

^ "ImageNet Large Scale Visual Recognition Competition 2012 (ILSVRC2012)". image-net.org.
^ Kumar Chellapilla; Sidd Puri; Patrice Simard (2006). "High Performance Convolutional Neural Networks for Document Processing". In Lorette, Guy (ed.). Tenth International Workshop on Frontiers in Handwriting Recognition. Suvisoft.
^ Cireșan, Dan; Ueli Meier; Jonathan Masci; Luca M. Gambardella; Jurgen Schmidhuber (2011). "Flexible, High Performance Convolutional Neural Networks for Image Classification" (PDF). Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence-Volume Volume Two. 2: 1237–1242. Retrieved 17 November 2013.
^ "IJCNN 2011 Competition result table". OFFICIAL IJCNN2011 COMPETITION. 2010. Retrieved 2019-01-14.
^ Schmidhuber, Jürgen (17 March 2017). "History of computer vision contests won by deep CNNs on GPU". Retrieved 14 January 2019.
^
S2CID 2309950
.

S2CID 2161592
.

OCLC 364746139
.

S2CID 14542261
. Retrieved October 7, 2016.

doi:10.4249/scholarpedia.1717
.

S2CID 206775608
. Retrieved 16 November 2013.

^ Weng, J; Ahuja, N; Huang, TS (1993). "Learning recognition and segmentation of 3-D objects from 2-D images". Proc. 4th International Conf. Computer Vision: 121–128.

S2CID 206594692
.

^ Deshpande, Adit. "The 9 Deep Learning Papers You Need To Know About (Understanding CNNs Part 3)". adeshpande3.github.io. Retrieved 2018-12-04.

^ AlexNet paper on Google Scholar

v
t
e
Differentiable computing
General

Differentiable programming

Information geometry

Statistical manifold

Automatic differentiation

Neuromorphic engineering

Pattern recognition

Tensor calculus

Computational learning theory

Inductive bias

Concepts

Gradient descent
SGD

Clustering

Regression
Overfitting

Hallucination

Adversary

Attention

Convolution

Loss functions

Backpropagation

Batchnorm

Activation
Softmax

Sigmoid

Rectifier

Regularization

Datasets

Augmentation

Diffusion

Autoregression

Applications

Machine learning
In-context learning

Artificial neural network

Deep learning

Scientific computing

Artificial Intelligence

Language model
Large language model

Hardware

IPU

TPU

VPU

Memristor

SpiNNaker

Software libraries

TensorFlow

PyTorch

Keras

Theano

JAX

Flux.jl

MindSpore

Implementations
Audio–visual

AlexNet

WaveNet

Human image synthesis

HWR

OCR

Speech synthesis

Speech recognition

Facial recognition

AlphaFold

Text-to-image models
DALL-E

Midjourney

Stable Diffusion

Text-to-video models
Sora

VideoPoet

Whisper

Verbal

Word2vec

Seq2seq

BERT

Gemini

LaMDA
Bard

NMT

Project Debater

IBM Watson

IBM Watsonx

Granite

GPT-1

GPT-2

GPT-3

GPT-4

ChatGPT

GPT-J

Chinchilla AI

PaLM

BLOOM

LLaMA

PanGu-Σ

Decisional

AlphaGo

AlphaZero

Q-learning

SARSA

OpenAI Five

Self-driving car

MuZero

Action selection
Auto-GPT

Robot control

People

Yoshua Bengio

Alex Graves

Ian Goodfellow

Stephen Grossberg

Demis Hassabis

Geoffrey Hinton

Yann LeCun

Fei-Fei Li

Andrew Ng

Jürgen Schmidhuber

David Silver

Ilya Sutskever

Organizations

Anthropic

EleutherAI

Google DeepMind

Hugging Face

OpenAI

Meta AI

Mila

MIT CSAIL

Huawei

Architectures

Neural Turing machine

Differentiable neural computer

Transformer

Recurrent neural network (RNN)

Long short-term memory (LSTM)

Gated recurrent unit (GRU)

Echo state network

Multilayer perceptron (MLP)

Convolutional neural network

Residual neural network

Mamba

Autoencoder

Variational autoencoder (VAE)

Generative adversarial network (GAN)

Graph neural network

Portals
Computer programming

Technology

Categories
Artificial neural networks

Machine learning

Retrieved from "https://en.wikipedia.org/w/index.php?title=AlexNet&oldid=1215233065"

[:1-1] Gershgorn, Dave (26 July 2017). "The data that transformed AI research—and possibly the world". Quartz.

[:0-2] 
S2CID 195908774
.

[:2-3] "ImageNet Large Scale Visual Recognition Competition 2012 (ILSVRC2012)". image-net.org.

[4] Kumar Chellapilla; Sidd Puri; Patrice Simard (2006). "High Performance Convolutional Neural Networks for Document Processing". In Lorette, Guy (ed.). Tenth International Workshop on Frontiers in Handwriting Recognition. Suvisoft.

[flexible-5] Cireșan, Dan; Ueli Meier; Jonathan Masci; Luca M. Gambardella; Jurgen Schmidhuber (2011). "Flexible, High Performance Convolutional Neural Networks for Image Classification" (PDF). Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence-Volume Volume Two. 2: 1237–1242. Retrieved 17 November 2013.

[6] "IJCNN 2011 Competition result table". OFFICIAL IJCNN2011 COMPETITION. 2010. Retrieved 2019-01-14.

[7] Schmidhuber, Jürgen (17 March 2017). "History of computer vision contests won by deep CNNs on GPU". Retrieved 14 January 2019.

[schdeepscholar-8] 
S2CID 2309950
.

[mcdns-9] S2CID 2161592
.

[LeCun_Boser_Denker_Henderson_1989_pp._541–551-10] OCLC 364746139
.

[lecun98-11] S2CID 14542261
. Retrieved October 7, 2016.

[fukuneoscholar-12] :10.4249/scholarpedia.1717
.

[intro-13] S2CID 206775608
. Retrieved 16 November 2013.

[weng1993-14] Weng, J; Ahuja, N; Huang, TS (1993). "Learning recognition and segmentation of 3-D objects from 2-D images". Proc. 4th International Conf. Computer Vision: 121–128.

[15] S2CID 206594692
.

[16] Deshpande, Adit. "The 9 Deep Learning Papers You Need To Know About (Understanding CNNs Part 3)". adeshpande3.github.io. Retrieved 2018-12-04.

[17] AlexNet paper on Google Scholar

[1]

[2]

[3]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]