Pattern recognition

Pattern recognition is the task of assigning a

processing power

.

Pattern recognition systems are commonly trained from labeled "training" data. When no labeled data are available, other algorithms can be used to discover previously unknown patterns. KDD and data mining have a larger focus on unsupervised methods and stronger connection to business use. Pattern recognition focuses more on the signal and also takes acquisition and signal processing into consideration. It originated in engineering, and the term is popular in the context of computer vision: a leading computer vision conference is named Conference on Computer Vision and Pattern Recognition.

In

syntactic structure of the sentence.^[3]

Pattern recognition algorithms generally aim to provide a reasonable answer for all possible inputs and to perform "most likely" matching of the inputs, taking into account their statistical variation. This is opposed to pattern matching algorithms, which look for exact matches in the input with pre-existing patterns. A common example of a pattern-matching algorithm is regular expression matching, which looks for patterns of a given sort in textual data and is included in the search capabilities of many text editors and word processors.

Overview

A modern definition of pattern recognition is:

The field of pattern recognition is concerned with the automatic discovery of regularities in data through the use of computer algorithms and with the use of these regularities to take actions such as classifying the data into different categories.^[4]

Pattern recognition is generally categorized according to the type of learning procedure used to generate the output value.

semi-supervised learning

, which uses a combination of labeled and unlabeled data (typically a small set of labeled data combined with a large amount of unlabeled data). In cases of unsupervised learning, there may be no training data at all.
Sometimes different terms are used to describe the corresponding supervised and unsupervised learning procedures for the same type of output. The unsupervised equivalent of classification is normally known as
community ecology
, the term classification is used to refer to what is commonly known as "clustering".
The piece of input data for which an output value is generated is formally termed an instance. The instance is formally described by a
nominal, i.e., consisting of one of a set of unordered items, such as a gender of "male" or "female", or a blood type of "A", "B", "AB" or "O"), ordinal (consisting of one of a set of ordered items, e.g., "large", "medium" or "small"), integer-valued (e.g., a count of the number of occurrences of a particular word in an email) or real-valued
(e.g., a measurement of blood pressure). Often, categorical and ordinal data are grouped together, and this is also the case for integer-valued and real-valued data. Many algorithms work only in terms of categorical data and require that real-valued or integer-valued data be discretized into groups (e.g., less than 5, between 5 and 10, or greater than 10).

Probabilistic classifiers

Main article:
Probabilistic classifier

Many common pattern recognition algorithms are probabilistic in nature, in that they use
classification
), N may be set so that the probability of all possible labels is output. Probabilistic algorithms have many advantages over non-probabilistic algorithms:

They output a confidence value associated with their choice. (Note that some other algorithms may also output confidence values, but in general, only for probabilistic algorithms is this value mathematically grounded in probability theory. Non-probabilistic confidence values can in general not be given any specific meaning, and only used to compare against other confidence values output by the same algorithm.)

Correspondingly, they can abstain when the confidence of choosing any particular output is too low.

Because of the probabilities output, probabilistic pattern-recognition algorithms can be more effectively incorporated into larger machine-learning tasks, in a way that partially or completely avoids the problem of error propagation.

Number of important feature variables

Feature selection algorithms attempt to directly prune out redundant or irrelevant features. A general introduction to feature selection which summarizes approaches and challenges, has been given.^[6] The complexity of feature-selection is, because of its non-monotonous character, an optimization problem where given a total of $n$ features the
powerset
consisting of all $2^{n}-1$ subsets of features need to be explored. The Branch-and-Bound algorithm^[7] does reduce this complexity but is intractable for medium to large values of the number of available features $n$
Techniques to transform the raw feature vectors (feature extraction) are sometimes used prior to application of the pattern-matching algorithm.
principal components analysis
(PCA). The distinction between feature selection and feature extraction is that the resulting features after feature extraction has taken place are of a different sort than the original features and may not easily be interpretable, while the features left after feature selection are simply a subset of the original features.

Problem statement

The problem of pattern recognition can be stated as follows: Given an unknown function $g:{\mathcal {X}}\rightarrow {\mathcal {Y}}$ (the ground truth) that maps input instances ${\boldsymbol {x}}\in {\mathcal {X}}$ to output labels $y\in {\mathcal {Y}}$ , along with training data $\mathbf {D} =\{({\boldsymbol {x}}_{1},y_{1}),\dots ,({\boldsymbol {x}}_{n},y_{n})\}$ assumed to represent accurate examples of the mapping, produce a function $h:{\mathcal {X}}\rightarrow {\mathcal {Y}}$ that approximates as closely as possible the correct mapping $g$ . (For example, if the problem is filtering spam, then ${\boldsymbol {x}}_{i}$ is some representation of an email and $y$ is either "spam" or "non-spam"). In order for this to be a well-defined problem, "approximates as closely as possible" needs to be defined rigorously. In decision theory, this is defined by specifying a loss function or cost function that assigns a specific value to "loss" resulting from producing an incorrect label. The goal then is to minimize the expected loss, with the expectation taken over the probability distribution of ${\mathcal {X}}$ . In practice, neither the distribution of ${\mathcal {X}}$ nor the ground truth function $g:{\mathcal {X}}\rightarrow {\mathcal {Y}}$ are known exactly, but can be computed only empirically by collecting a large number of samples of ${\mathcal {X}}$ and hand-labeling them using the correct value of ${\mathcal {Y}}$ (a time-consuming process, which is typically the limiting factor in the amount of data of this sort that can be collected). The particular loss function depends on the type of label being predicted. For example, in the case of
zero-one loss function is often sufficient. This corresponds simply to assigning a loss of 1 to any incorrect labeling and implies that the optimal classifier minimizes the error rate
on independent test data (i.e. counting up the fraction of instances that the learned function $h:{\mathcal {X}}\rightarrow {\mathcal {Y}}$ labels wrongly, which is equivalent to maximizing the number of correctly classified instances). The goal of the learning procedure is then to minimize the error rate (maximize the correctness) on a "typical" test set.
For a probabilistic pattern recognizer, the problem is instead to estimate the probability of each possible output label given a particular input instance, i.e., to estimate a function of the form

$p({\rm {label}}|{\boldsymbol {x}},{\boldsymbol {\theta }})=f\left({\boldsymbol {x}};{\boldsymbol {\theta }}\right)$

where the
feature vector
input is ${\boldsymbol {x}}$ , and the function f is typically parameterized by some parameters ${\boldsymbol {\theta }}$ .^[8] In a discriminative approach to the problem, f is estimated directly. In a generative approach, however, the inverse probability $p({{\boldsymbol {x}}|{\rm {label}}})$ is instead estimated and combined with the prior probability $p({\rm {label}}|{\boldsymbol {\theta }})$ using
Bayes' rule
, as follows:

$p({\rm {label}}|{\boldsymbol {x}},{\boldsymbol {\theta }})={\frac {p({{\boldsymbol {x}}|{\rm {label,{\boldsymbol {\theta }}}}})p({\rm {label|{\boldsymbol {\theta }}}})}{\sum _{L\in {\text{all labels}}}p({\boldsymbol {x}}|L)p(L|{\boldsymbol {\theta }})}}.$

When the labels are
continuously distributed (e.g., in regression analysis), the denominator involves integration
rather than summation:

$p({\rm {label}}|{\boldsymbol {x}},{\boldsymbol {\theta }})={\frac {p({{\boldsymbol {x}}|{\rm {label,{\boldsymbol {\theta }}}}})p({\rm {label|{\boldsymbol {\theta }}}})}{\int _{L\in {\text{all labels}}}p({\boldsymbol {x}}|L)p(L|{\boldsymbol {\theta }})\operatorname {d} L}}.$

The value of ${\boldsymbol {\theta }}$ is typically learned using
maximum likelihood estimation with a regularization procedure that favors simpler models over more complex models. In a Bayesian context, the regularization procedure can be viewed as placing a prior probability
$p({\boldsymbol {\theta }})$ on different values of ${\boldsymbol {\theta }}$ . Mathematically:

${\boldsymbol {\theta }}^{*}=\arg \max _{\boldsymbol {\theta }}p({\boldsymbol {\theta }}|\mathbf {D} )$

where ${\boldsymbol {\theta }}^{*}$ is the value used for ${\boldsymbol {\theta }}$ in the subsequent evaluation procedure, and $p({\boldsymbol {\theta }}|\mathbf {D} )$ , the posterior probability of ${\boldsymbol {\theta }}$ , is given by

$p({\boldsymbol {\theta }}|\mathbf {D} )=\left[\prod _{i=1}^{n}p(y_{i}|{\boldsymbol {x}}_{i},{\boldsymbol {\theta }})\right]p({\boldsymbol {\theta }}).$

In the Bayesian approach to this problem, instead of choosing a single parameter vector ${\boldsymbol {\theta }}^{*}$ , the probability of a given label for a new instance ${\boldsymbol {x}}$ is computed by integrating over all possible values of ${\boldsymbol {\theta }}$ , weighted according to the posterior probability:

$p({\rm {label}}|{\boldsymbol {x}})=\int p({\rm {label}}|{\boldsymbol {x}},{\boldsymbol {\theta }})p({\boldsymbol {\theta }}|\mathbf {D} )\operatorname {d} {\boldsymbol {\theta }}.$

Frequentist or Bayesian approach to pattern recognition

The first pattern classifier – the linear discriminant presented by
Fisher – was developed in the frequentist tradition. The frequentist approach entails that the model parameters are considered unknown, but objective. The parameters are then computed (estimated) from the collected data. For the linear discriminant, these parameters are precisely the mean vectors and the covariance matrix
. Also the probability of each class $p({\rm {label}}|{\boldsymbol {\theta }})$ is estimated from the collected dataset. Note that the usage of '
Bayes rule
' in a pattern classifier does not make the classification approach Bayesian.
Bayesian statistics has its origin in Greek philosophy where a distinction was already made between the 'a priori' and the 'a posteriori' knowledge. Later Kant defined his distinction between what is a priori known – before observation – and the empirical knowledge gained from observations. In a Bayesian pattern classifier, the class probabilities $p({\rm {label}}|{\boldsymbol {\theta }})$ can be chosen by the user, which are then a priori. Moreover, experience quantified as a priori parameter values can be weighted with empirical observations – using e.g., the
conjugate prior) and Dirichlet-distributions
. The Bayesian approach facilitates a seamless intermixing between expert knowledge in the form of subjective probabilities, and objective observations.
Probabilistic pattern classifiers can be used according to a frequentist or a Bayesian approach.

Uses

The face was automatically detected
by special software.

Within medical science, pattern recognition is the basis for
recognition of images of human faces, or handwriting image extraction from medical forms.^[9]^[10] The last two examples form the subtopic image analysis of pattern recognition that deals with digital images as input to pattern recognition systems.^[11]^[12]

Optical character recognition is an example of the application of a pattern classifier. The method of signing one's name was captured with stylus and overlay starting in 1990.^[citation needed] The strokes, speed, relative min, relative max, acceleration and pressure is used to uniquely identify and confirm identity. Banks were first offered this technology, but were content to collect from the FDIC for any bank fraud and did not want to inconvenience customers.^{[citation needed]}
Pattern recognition has many real-world applications in image processing. Some examples include:

identification and authentication: e.g.,
voice-based authentication.^[15]

medical diagnosis: e.g., screening for cervical cancer (Papnet),[16] breast tumors or heart sounds;

defense: various navigation and guidance systems, target recognition systems, shape recognition technology etc.

mobility:
advanced driver assistance systems, autonomous vehicle technology, etc.^[17]^[18]^[19]^[20]^[21]

In psychology, pattern recognition is used to make sense of and identify objects, and is closely related to perception. This explains how the sensory inputs humans receive are made meaningful. Pattern recognition can be thought of in two different ways. The first concerns template matching and the second concerns feature detection. A template is a pattern used to produce items of the same proportions. The template-matching hypothesis suggests that incoming stimuli are compared with templates in the long-term memory. If there is a match, the stimulus is identified. Feature detection models, such as the Pandemonium system for classifying letters (Selfridge, 1959), suggest that the stimuli are broken down into their component parts for identification. One observation is a capital E having three horizontal lines and one vertical line.^[22]

Algorithms

Algorithms for pattern recognition depend on the type of label output, on whether learning is supervised or unsupervised, and on whether the algorithm is statistical or non-statistical in nature. Statistical algorithms can further be categorized as generative or discriminative.

This article may contain
embedded lists. Please help clean up the lists
by removing items or incorporating them into the text of the article. (May 2014)

Classification methods (methods predicting categorical labels)

Main article: Statistical classification

Parametric:^[23]

Linear discriminant analysis

Quadratic discriminant analysis

Maximum entropy classifier (aka logistic regression, multinomial logistic regression
): Note that logistic regression is an algorithm for classification, despite its name. (The name comes from the fact that logistic regression uses an extension of a linear regression model to model the probability of an input being in a particular class.)

Nonparametric:[24]

Decision trees, decision lists

K-nearest-neighbor
algorithms

Naive Bayes classifier

Neural networks
(multi-layer perceptrons)

Perceptrons

Support vector machines

Gene expression programming

Clustering methods (methods for classifying and predicting categorical labels)

Main article: Cluster analysis

Categorical mixture models

Hierarchical clustering (agglomerative or divisive)

K-means clustering

Correlation clustering

Kernel principal component analysis (Kernel PCA)

Ensemble learning algorithms (supervised meta-algorithms for combining multiple learning algorithms together)

Main article: Ensemble learning

Boosting (meta-algorithm)

Bootstrap aggregating ("bagging")

Ensemble averaging

hierarchical mixture of experts

General methods for predicting arbitrarily-structured (sets of) labels

Bayesian networks

Markov random fields

Multilinear subspace learning algorithms (predicting labels of multidimensional data using tensor representations)

Unsupervised:

Multilinear principal component analysis (MPCA)

Real-valued sequence labeling methods (predicting sequences of real-valued labels)

Main article: sequence labeling

Kalman filters

Particle filters

Regression methods (predicting real-valued labels)

Main article: Regression analysis

Gaussian process regression
(kriging)

Linear regression and extensions

Independent component analysis (ICA)

Principal components analysis
(PCA)

Sequence labeling methods (predicting sequences of categorical labels)

Conditional random fields (CRFs)

Hidden Markov models (HMMs)

Maximum entropy Markov models
(MEMMs)

Recurrent neural networks
(RNNs)

Dynamic time warping (DTW)

See also

Adaptive resonance theory

Black box

Cache language model

Compound-term processing

Computer-aided diagnosis

Data mining

Deep Learning

Information theory

List of numerical-analysis software

List of numerical libraries

Neocognitron

Perception

Perceptual learning

Predictive analytics

Prior knowledge for pattern recognition

Sequence mining

Template matching

Contextual image classification

List of datasets for machine learning research

References

ISSN 0368-492X
.

^ "Sequence Labeling" (PDF). utah.edu. Archived (PDF) from the original on 2018-11-06. Retrieved 2018-11-06.

OCLC 799802313
.

^ Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer.

S2CID 21050445.{{cite journal}}: CS1 maint: multiple names: authors list (link
).

^ Isabelle Guyon Clopinet, André Elisseeff (2003). An Introduction to Variable and Feature Selection. The Journal of Machine Learning Research, Vol. 3, 1157-1182. Link Archived 2016-03-04 at the Wayback Machine

^ Iman Foroutan; Jack Sklansky (1987). "Feature Selection for Automatic Classification of Non-Gaussian Data". IEEE Transactions on Systems, Man, and Cybernetics. 17 (2): 187–198.
S2CID 9871395
..

^ For linear discriminant analysis the parameter vector ${\boldsymbol {\theta }}$ consists of the two mean vectors ${\boldsymbol {\mu }}_{1}$ and ${\boldsymbol {\mu }}_{2}$ and the common covariance matrix ${\boldsymbol {\Sigma }}$ .

doi:10.1016/j.patcog.2007.08.018. Archived
from the original on 10 September 2020. Retrieved 26 October 2011.

S2CID 220665533
.

ISBN 978-0-471-05669-0. Archived from the original on 2020-08-19. Retrieved 2019-11-26.{{cite book}}: CS1 maint: multiple names: authors list (link
)

ISBN 978-0-470-51706-2
, 2009

^ THE AUTOMATIC NUMBER PLATE RECOGNITION TUTORIAL Archived 2006-08-20 at the Wayback Machine http://anpr-tutorial.com/ Archived 2006-08-20 at the Wayback Machine

^ Neural Networks for Face Recognition Archived 2016-03-04 at the Wayback Machine Companion to Chapter 4 of the textbook Machine Learning.

doi:10.1049/iet-bmt.2017.0065. Archived
from the original on 2019-09-03. Retrieved 2019-08-27.

^ PAPNET For Cervical Screening Archived 2012-07-08 at archive.today

^ "Development of an Autonomous Vehicle Control Strategy Using a Single Camera and Deep Neural Networks (2018-01-0035 Technical Paper)- SAE Mobilus". saemobilus.sae.org. Archived from the original on 2019-09-06. Retrieved 2019-09-06.

S2CID 89616974
.

^ Pickering, Chris (2017-08-15). "How AI is paving the way for fully autonomous cars". The Engineer. Archived from the original on 2019-09-06. Retrieved 2019-09-06.

Bibcode:2017arXiv170808559T. {{cite journal}}: Cite journal requires |journal= (help
)

ISSN 1474-6670
.

^ "A-level Psychology Attention Revision - Pattern recognition | S-cool, the revision website". S-cool.co.uk. Archived from the original on 2013-06-22. Retrieved 2012-09-17.

Gaussian
shape.

^ No distributional assumption regarding shape of feature distributions per class.

Further reading

Fukunaga, Keinosuke (1990). Introduction to Statistical Pattern Recognition (2nd ed.). Boston: Academic Press.
ISBN 978-0-12-269851-4
.

Hornegger, Joachim; Paulus, Dietrich W. R. (1999). Applied Pattern Recognition: A Practical Introduction to Image and Speech Processing in C++ (2nd ed.). San Francisco: Morgan Kaufmann Publishers.
ISBN 978-3-528-15558-2
.

Schuermann, Juergen (1996). Pattern Classification: A Unified View of Statistical and Neural Approaches. New York: Wiley.
ISBN 978-0-471-13534-0
.

Godfried T. Toussaint, ed. (1988). Computational Morphology. Amsterdam: North-Holland Publishing Company.
ISBN 9781483296722
.

Kulikowski, Casimir A.; Weiss, Sholom M. (1991). Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. San Francisco: Morgan Kaufmann Publishers.
ISBN 978-1-55860-065-2
.

Duda, Richard O.; Hart, Peter E.; Stork, David G. (2000). Pattern Classification (2nd ed.). Wiley-Interscience.
ISBN 978-0471056690
.

Jain, Anil.K.; Duin, Robert.P.W.; Mao, Jianchang (2000). "Statistical pattern recognition: a review". IEEE Transactions on Pattern Analysis and Machine Intelligence. 22 (1): 4–37.
S2CID 192934
.

An introductory tutorial to classifiers (introducing the basic terms, with numeric example)

Kovalevsky, V. A. (1980). Image Pattern Recognition. New York, NY: Springer New York.
OCLC 852790446
.

External links

The International Association for Pattern Recognition

List of Pattern Recognition web sites

Journal of Pattern Recognition Research Archived 2008-09-08 at the Wayback Machine

Pattern Recognition Info

Pattern Recognition (Journal of the Pattern Recognition Society)

International Journal of Pattern Recognition and Artificial Intelligence Archived 2004-12-11 at the Wayback Machine

International Journal of Applied Pattern Recognition

Open Pattern Recognition Project, intended to be an open source platform for sharing algorithms of pattern recognition

Improved Fast Pattern Matching Improved Fast Pattern Matching

v
t
e
Differentiable computing
General

Differentiable programming

Information geometry

Statistical manifold

Automatic differentiation

Neuromorphic engineering

Pattern recognition

Tensor calculus

Computational learning theory

Inductive bias

Concepts

Gradient descent
SGD

Clustering

Regression
Overfitting

Hallucination

Adversary

Attention

Convolution

Loss functions

Backpropagation

Batchnorm

Activation
Softmax

Sigmoid

Rectifier

Regularization

Datasets

Augmentation

Diffusion

Autoregression

Applications

Machine learning
In-context learning

Artificial neural network

Deep learning

Scientific computing

Artificial Intelligence

Language model
Large language model

Hardware

IPU

TPU

VPU

Memristor

SpiNNaker

Software libraries

TensorFlow

PyTorch

Keras

Theano

JAX

Flux.jl

MindSpore

Implementations
Audio–visual

AlexNet

WaveNet

Human image synthesis

HWR

OCR

Speech synthesis

Speech recognition

Facial recognition

AlphaFold

Text-to-image models
DALL-E

Midjourney

Stable Diffusion

Text-to-video models
Sora

VideoPoet

Whisper

Verbal

Word2vec

Seq2seq

BERT

Gemini

LaMDA
Bard

NMT

Project Debater

IBM Watson

IBM Watsonx

Granite

GPT-1

GPT-2

GPT-3

GPT-4

ChatGPT

GPT-J

Chinchilla AI

PaLM

BLOOM

LLaMA

PanGu-Σ

Decisional

AlphaGo

AlphaZero

Q-learning

SARSA

OpenAI Five

Self-driving car

MuZero

Action selection
Auto-GPT

Robot control

People

Yoshua Bengio

Alex Graves

Ian Goodfellow

Stephen Grossberg

Demis Hassabis

Geoffrey Hinton

Yann LeCun

Fei-Fei Li

Andrew Ng

Jürgen Schmidhuber

David Silver

Ilya Sutskever

Organizations

Anthropic

EleutherAI

Google DeepMind

Hugging Face

OpenAI

Meta AI

Mila

MIT CSAIL

Huawei

Architectures

Neural Turing machine

Differentiable neural computer

Transformer

Recurrent neural network (RNN)

Long short-term memory (LSTM)

Gated recurrent unit (GRU)

Echo state network

Multilayer perceptron (MLP)

Convolutional neural network

Residual neural network

Mamba

Autoencoder

Variational autoencoder (VAE)

Generative adversarial network (GAN)

Graph neural network

Portals
Computer programming

Technology

Categories
Artificial neural networks

Machine learning

Authority control databases: National

Germany

Israel

United States

Japan

Czech Republic

Retrieved from "https://en.wikipedia.org/w/index.php?title=Pattern_recognition&oldid=1206000594"

[1] ISSN 0368-492X
.

[2] "Sequence Labeling" (PDF). utah.edu. Archived (PDF) from the original on 2018-11-06. Retrieved 2018-11-06.

[3] OCLC 799802313
.

[Bishop2006-4] Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer.

[5] S2CID 21050445.{{cite journal}}: CS1 maint: multiple names: authors list (link
).

[6] Isabelle Guyon Clopinet, André Elisseeff (2003). An Introduction to Variable and Feature Selection. The Journal of Machine Learning Research, Vol. 3, 1157-1182. Link Archived 2016-03-04 at the Wayback Machine

[7] Iman Foroutan; Jack Sklansky (1987). "Feature Selection for Automatic Classification of Non-Gaussian Data". IEEE Transactions on Systems, Man, and Cybernetics. 17 (2): 187–198.
S2CID 9871395
..

[8] For linear discriminant analysis the parameter vector ${\boldsymbol {\theta }}$ consists of the two mean vectors ${\boldsymbol {\mu }}_{1}$ and ${\boldsymbol {\mu }}_{2}$ and the common covariance matrix ${\boldsymbol {\Sigma }}$ .

[9] :10.1016/j.patcog.2007.08.018. Archived
from the original on 10 September 2020. Retrieved 26 October 2011.

[10] S2CID 220665533
.

[duda2001-11] ISBN 978-0-471-05669-0. Archived from the original on 2020-08-19. Retrieved 2019-11-26.{{cite book}}: CS1 maint: multiple names: authors list (link
)

[12] ISBN 978-0-470-51706-2
, 2009

[13] THE AUTOMATIC NUMBER PLATE RECOGNITION TUTORIAL Archived 2006-08-20 at the Wayback Machine http://anpr-tutorial.com/ Archived 2006-08-20 at the Wayback Machine

[14] Neural Networks for Face Recognition Archived 2016-03-04 at the Wayback Machine Companion to Chapter 4 of the textbook Machine Learning.

[15] :10.1049/iet-bmt.2017.0065. Archived
from the original on 2019-09-03. Retrieved 2019-08-27.

[16] PAPNET For Cervical Screening Archived 2012-07-08 at archive.today

[17] "Development of an Autonomous Vehicle Control Strategy Using a Single Camera and Deep Neural Networks (2018-01-0035 Technical Paper)- SAE Mobilus". saemobilus.sae.org. Archived from the original on 2019-09-06. Retrieved 2019-09-06.

[18] S2CID 89616974
.

[19] Pickering, Chris (2017-08-15). "How AI is paving the way for fully autonomous cars". The Engineer. Archived from the original on 2019-09-06. Retrieved 2019-09-06.

[20] Bibcode:2017arXiv170808559T. {{cite journal}}: Cite journal requires |journal= (help
)

[21] ISSN 1474-6670
.

[22] "A-level Psychology Attention Revision - Pattern recognition | S-cool, the revision website". S-cool.co.uk. Archived from the original on 2013-06-22. Retrieved 2012-09-17.

[23] Gaussian
shape.

[24] No distributional assumption regarding shape of feature distributions per class.

[3]

[4]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[15]

[17]

[18]

[19]

[20]

[21]

[22]

[23]