Artificial intelligence

Page semi-protected
Source: Wikipedia, the free encyclopedia.

Artificial intelligence (AI) is the

field of research in computer science focusing on the automation of intelligent behavior through techniques such as machine learning, it develops and studies methods and software which enable machines to perform tasks that are typically associated with human intelligence
. Such machines may be called AIs.

AI art), and superhuman play and analysis in strategy games (e.g., chess and Go).[1] However, many AI applications are not perceived as AI: "A lot of cutting edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it's not labeled AI anymore."[2][3]

The growing use of artificial intelligence in the 21st century is influencing

risks of AI, prompting discussions about regulatory policies to ensure the safety and benefits of the technology
.

The various sub-fields of AI research are centered around particular goals and the use of particular tools. The traditional goals of AI research include

knowledge representation, planning, learning, natural language processing, perception, and support for robotics.[a] General intelligence—the ability to complete any task performable by a human on an at least equal level—is among the field's long-term goals.[13]

To reach these goals, AI researchers have adapted and integrated a wide range of techniques, including

artificial neural networks, and methods based on statistics, operations research, and economics.[b] AI also draws upon psychology, linguistics, philosophy, neuroscience, and other fields.[14]

Goals

The general problem of simulating (or creating) intelligence has been broken into sub-problems. These consist of particular traits or capabilities that researchers expect an intelligent system to display. The traits described below have received the most attention and cover the scope of AI research.[a]

Reasoning and problem solving

Early researchers developed algorithms that imitated step-by-step reasoning that humans use when they solve puzzles or make logical deductions.[15] By the late 1980s and 1990s, methods were developed for dealing with uncertain or incomplete information, employing concepts from probability and economics.[16]

Many of these algorithms are insufficient for solving large reasoning problems because they experience a "combinatorial explosion": they became exponentially slower as the problems grew larger.[17] Even humans rarely use the step-by-step deduction that early AI research could model. They solve most of their problems using fast, intuitive judgments.[18] Accurate and efficient reasoning is an unsolved problem.

Knowledge representation

An ontology represents knowledge as a set of concepts within a domain and the relationships between those concepts.

Knowledge representation and knowledge engineering[19] allow AI programs to answer questions intelligently and make deductions about real-world facts. Formal knowledge representations are used in content-based indexing and retrieval,[20] scene interpretation,[21] clinical decision support,[22] knowledge discovery (mining "interesting" and actionable inferences from large databases),[23] and other areas.[24]

A

default reasoning (things that humans assume are true until they are told differently and will remain true even when other facts are changing);[30]
and many other aspects and domains of knowledge.

Among the most difficult problems in knowledge representation are: the breadth of commonsense knowledge (the set of atomic facts that the average person knows is enormous);[31] and the sub-symbolic form of most commonsense knowledge (much of what people know is not represented as "facts" or "statements" that they could express verbally).[18] There is also the difficulty of knowledge acquisition, the problem of obtaining knowledge for AI applications.[c]

Planning and decision making

An "agent" is anything that perceives and takes actions in the world. A

expected utility": the utility of all possible outcomes of the action, weighted by the probability that the outcome will occur. It can then choose the action with the maximum expected utility.[36]

In classical planning, the agent knows exactly what the effect of any action will be.[37] In most real-world problems, however, the agent may not be certain about the situation they are in (it is "unknown" or "unobservable") and it may not know for certain what will happen after each possible action (it is not "deterministic"). It must choose an action by making a probabilistic guess and then reassess the situation to see if the action worked.[38]

In some problems, the agent's preferences may be uncertain, especially if there are other agents or humans involved. These can be learned (e.g., with

intractably
large, so the agents must take actions and evaluate situations while being uncertain what the outcome will be.

A

iteration), be heuristic, or it can be learned.[41]

Game theory describes rational behavior of multiple interacting agents, and is used in AI programs that make decisions that involve other agents.[42]

Learning

Machine learning is the study of programs that can improve their performance on a given task automatically.[43] It has been a part of AI from the beginning.[e]

There are several kinds of machine learning. Unsupervised learning analyzes a stream of data and finds patterns and makes predictions without any other guidance.[46] Supervised learning requires a human to label the input data first, and comes in two main varieties: classification (where the program must learn to predict what category the input belongs in) and regression (where the program must deduce a numeric function based on numeric input).[47]

In

artificial neural networks for all of these types of learning.[50]

Natural language processing

Early work, based on

thesauri
and not dictionaries should be the basis of computational language structure.

Modern deep learning techniques for NLP include

SAT test, GRE test, and many other real-world applications.[59]

Perception

Machine perception is the ability to use input from sensors (such as cameras, microphones, wireless signals, active lidar, sonar, radar, and tactile sensors) to deduce aspects of the world. Computer vision is the ability to analyze visual input.[60]

The field includes

object recognition,[63] and robotic perception.[64]

Social intelligence

Kismet, a robot head which was made in the 1990s; a machine that can recognize and simulate emotions.[65]

Affective computing is an interdisciplinary umbrella that comprises systems that recognize, interpret, process or simulate human feeling, emotion and mood.[66] For example, some virtual assistants are programmed to speak conversationally or even to banter humorously; it makes them appear more sensitive to the emotional dynamics of human interaction, or to otherwise facilitate human–computer interaction.

However, this tends to give naïve users an unrealistic conception of the intelligence of existing computer agents.[67] Moderate successes related to affective computing include textual sentiment analysis and, more recently, multimodal sentiment analysis, wherein AI classifies the affects displayed by a videotaped subject.[68]

General intelligence

A machine with artificial general intelligence should be able to solve a wide variety of problems with breadth and versatility similar to human intelligence.[13]

Techniques

AI research uses a wide variety of techniques to accomplish the goals above.[b]

Search and optimization

AI can solve many problems by intelligently searching through many possible solutions.[69] There are two very different kinds of search used in AI: state space search and local search.

State space search

means-ends analysis.[71]

Heuristics" or "rules of thumb" can help to prioritize choices that are more likely to reach a goal.[73]

game-playing programs, such as chess or Go. It searches through a tree of possible moves and counter-moves, looking for a winning position.[74]

Local search

Illustration of gradient descent for 3 different starting points. Two parameters (represented by the plan coordinates) are adjusted in order to minimize the loss function (the height).

Local search uses mathematical optimization to find a solution to a problem. It begins with some form of guess and refines it incrementally.[75]

Gradient descent is a type of local search that optimizes a set of numerical parameters by incrementally adjusting them to minimize a loss function. Variants of gradient descent are commonly used to train neural networks.[76]

Another type of local search is

selecting only the fittest to survive each generation.[77]

Distributed search processes can coordinate via

Logic

Formal

Formal logic comes in two main forms:
propositional logic (which operates on statements that are true or false and uses logical connectives such as "and", "or", "not" and "implies")[80]
and
predicate logic (which also operates on objects, predicates and relations and uses quantifiers such as "Every X is a Y" and "There are some Xs that are Ys").[81]

Logical

proving a new statement (conclusion) from other statements that are already known to be true (the premises).[82]
A logical knowledge base also handles queries and assertions as a special case of inference.[83] An
inference rule describes what is a valid step in a proof. The most general inference rule is resolution.[84]
Inference can be reduced to performing a search to find a path that leads from premises to conclusions, where each step is the application of an Inference performed this way is
intractable
except for short proofs in restricted domains. No efficient, powerful and general method has been discovered.

Fuzzy logic assigns a "degree of truth" between 0 and 1. It can therefore handle propositions that are vague and partially true.[86]

default reasoning.[30]
Other specialized versions of logic have been developed to describe many complex domains (see knowledge representation above).

Probabilistic methods for uncertain reasoning

A simple Bayesian network, with the associated conditional probability tables

Many problems in AI (including in reasoning, planning, learning, perception, and robotics) require the agent to operate with incomplete or uncertain information. AI researchers have devised a number of tools to solve these problems using methods from probability theory and economics.[87]

Bayesian networks[88] are a very general tool that can be used for many problems, including

decision networks)[93]
and perception (using dynamic Bayesian networks).[94]

Probabilistic algorithms can also be used for filtering, prediction, smoothing and finding explanations for streams of data, helping perception systems to analyze processes that occur over time (e.g., hidden Markov models or Kalman filters).[94]

Precise mathematical tools have been developed that analyze how an agent can make choices and plan, using

Expectation-maximization clustering of Old Faithful
eruption data starts from a random guess but then successfully converges on an accurate clustering of the two physically distinct modes of eruption.

Classifiers and statistical learning methods

The simplest AI applications can be divided into two types: classifiers (e.g., "if shiny then diamond"), on one hand, and controllers (e.g., "if diamond then pick up"), on the other hand.

are functions that use pattern matching to determine the closest match. They can be fine-tuned based on chosen examples using supervised learning. Each pattern (also called an "observation") is labeled with a certain predefined class. All the observations combined with their class labels are known as a data set. When a new observation is received, that observation is classified based on previous experience.[47]

There are many kinds of classifiers in use. The

Kernel methods such as the support vector machine (SVM) displaced k-nearest neighbor in the 1990s.[101]
The naive Bayes classifier is reportedly the "most widely used learner"[102] at Google, due in part to its scalability.[103]
Neural networks are also used as classifiers.[104]

Artificial neural networks

A neural network is an interconnected group of nodes, akin to the vast network of neurons in the human brain.

An artificial neural network is based on a collection of nodes also known as

neurons in a biological brain. It is trained to recognise patterns; once trained, it can recognise those patterns in fresh data. There is an input, at least one hidden layer of nodes and an output. Each node applies a function and once the weight crosses its specified threshold, the data is transmitted to the next layer. A network is typically called a deep neural network if it has at least 2 hidden layers.[104]

Learning algorithms for neural networks use local search to choose the weights that will get the right output for each input during training. The most common training technique is the backpropagation algorithm.[105] Neural networks learn to model complex relationships between inputs and outputs and find patterns in data. In theory, a neural network can learn any function.[106]

In feedforward neural networks the signal passes in only one direction.[107]

Long short term memory is the most successful network architecture for recurrent networks.[108]
Perceptrons[109] use only a single layer of neurons, deep learning[110] uses multiple layers.
image processing, where a local set of neurons must identify an "edge" before the network can identify an object.[111]

Deep learning

Deep learning[110] uses several layers of neurons between the network's inputs and outputs. The multiple layers can progressively extract higher-level features from the raw input. For example, in

image processing, lower layers may identify edges, while higher layers may identify the concepts relevant to a human such as digits or letters or faces.[112]

Deep learning has profoundly improved the performance of programs in many important subfields of artificial intelligence, including

and others. The reason that deep learning performs so well in so many applications is not known as of 2023.[114] The sudden success of deep learning in 2012–2015 did not occur because of some new discovery or theoretical breakthrough (deep neural networks and backpropagation had been described by many people, as far back as the 1950s)[i] but because of two factors: the incredible increase in computer power (including the hundred-fold increase in speed by switching to
GPUs) and the availability of vast amounts of training data, especially the giant curated datasets used for benchmark testing, such as ImageNet.[j]

GPT

Generative pre-trained transformers (GPT) are large language models that are based on the semantic relationships between words in sentences (natural language processing). Text-based GPT models are pre-trained on a large corpus of text which can be from the internet. The pre-training consists in predicting the next token (a token being usually a word, subword, or punctuation). Throughout this pre-training, GPT models accumulate knowledge about the world, and can then generate human-like text by repeatedly predicting the next token. Typically, a subsequent training phase makes the model more truthful, useful and harmless, usually with a technique called reinforcement learning from human feedback (RLHF). Current GPT models are still prone to generating falsehoods called "hallucinations", although this can be reduced with RLHF and quality data. They are used in chatbots, which allow you to ask a question or request a task in simple text.[123][124]

Current models and services include: Gemini (formerly Bard), ChatGPT, Grok, Claude, Copilot and LLaMA.[125] Multimodal GPT models can process different types of data (modalities) such as images, videos, sound and text.[126]

Specialized hardware and software

In the late 2010s, graphics processing units (GPUs) that were increasingly designed with AI-specific enhancements and used with specialized TensorFlow software, had replaced previously used central processing unit (CPUs) as the dominant means for large-scale (commercial and academic) machine learning models' training.[127] Historically, specialized languages, such as Lisp, Prolog, Python and others, had been used.

Applications