Feature (machine learning)

In

explanatory variable used in statistical techniques such as linear regression

Feature types

In feature engineering, two types of features are commonly used: numerical and categorical.

Numerical features are continuous values that can be measured on a scale. Examples of numerical features include age, height, weight, and income. Numerical features can be used in machine learning algorithms directly.[2]

Categorical features are discrete values that can be grouped into categories. Examples of categorical features include gender, color, and zip code. Categorical features typically need to be converted to numerical features before they can be used in machine learning algorithms. This can be done using a variety of techniques, such as one-hot encoding, label encoding, and ordinal encoding.

The type of feature that is used in feature engineering depends on the specific machine learning algorithm that is being used. Some machine learning algorithms, such as decision trees, can handle both numerical and categorical features. Other machine learning algorithms, such as linear regression, can only handle numerical features.

Classification

A numeric feature can be conveniently described by a feature vector. One way to achieve binary classification is using a linear predictor function (related to the perceptron) with a feature vector as input. The method consists of calculating the scalar product between the feature vector and a vector of weights, qualifying those observations whose result exceeds a threshold.

Algorithms for classification from a feature vector include

neural networks, and statistical techniques such as Bayesian approaches

.

Examples

In

character recognition, features may include histograms

counting the number of black pixels along horizontal and vertical directions, number of internal holes, stroke detection and many others.

In

phonemes

can include noise ratios, length of sounds, relative power, filter matches and many others.

In

spam

detection algorithms, features may include the presence or absence of certain email headers, the email structure, the language, the frequency of specific terms, the grammatical correctness of the text.

In computer vision, there are a large number of possible features, such as edges and objects.

Feature vectors

In

explanatory variables used in statistical procedures such as linear regression. Feature vectors are often combined with weights using a dot product in order to construct a linear predictor function

that is used to determine a score for making a prediction.

The vector space associated with these vectors is often called the feature space. In order to reduce the dimensionality of the feature space, a number of dimensionality reduction techniques can be employed.

Higher-level features can be obtained from already available features and added to the feature vector; for example, for the study of diseases the feature 'Age' is useful and is defined as Age = 'Year of death' minus 'Year of birth' . This process is referred to as feature construction.^[3]^[4] Feature construction is the application of a set of constructive operators to a set of existing features resulting in construction of new features. Examples of such constructive operators include checking for the equality conditions {=, ≠}, the arithmetic operators {+,−,×, /}, the array operators {max(S), min(S), average(S)} as well as other more sophisticated operators, for example count(S,C)^[5] that counts the number of features in the feature vector S satisfying some condition C or, for example, distances to other recognition classes generalized by some accepting device. Feature construction has long been considered a powerful tool for increasing both accuracy and understanding of structure, particularly in high-dimensional problems.^[6] Applications include studies of disease and emotion recognition from speech.^[7]

Selection and extraction

The initial set of raw features can be redundant and large enough that estimation and optimization is made difficult or ineffective. Therefore, a preliminary step in many applications of

constructing a new and reduced set of features to facilitate learning, and to improve generalization and interpretability.^[8]

Extracting or selecting features is a combination of art and science; developing systems to do so is known as

domain expert. Automating this process is feature learning

, where a machine not only uses features for learning, but learns the features itself.

References

ISBN 0-387-31073-8
.

^ Andrew Engel (2022). "Categorical Variables for Machine Learning Algorithms". Towards Data Science.

^ Liu, H., Motoda H. (1998) Feature Selection for Knowledge Discovery and Data Mining., Kluwer Academic Publishers. Norwell, MA, USA. 1998.

^ Piramuthu, S., Sikora R. T. Iterative feature construction for improving inductive learning algorithms. In Journal of Expert Systems with Applications. Vol. 36 , Iss. 2 (March 2009), pp. 3401-3406, 2009

^ Bloedorn, E., Michalski, R. Data-driven constructive induction: a methodology and its applications. IEEE Intelligent Systems, Special issue on Feature Transformation and Subset Selection, pp. 30-37, March/April, 1998

^ Breiman, L. Friedman, T., Olshen, R., Stone, C. (1984) Classification and regression trees, Wadsworth

^ Sidorova, J., Badia T. Syntactic learning for ESEDA.1, tool for enhanced speech emotion detection and analysis. Internet Technology and Secured Transactions Conference 2009 (ICITST-2009), London, November 9–12. IEEE

ISBN 978-0-387-84884-6
.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Feature_(machine_learning)&oldid=1201634851"

[ml-1] ISBN 0-387-31073-8
.

[categorical-variables-2] Andrew Engel (2022). "Categorical Variables for Machine Learning Algorithms". Towards Data Science.

[Liu1998-3] Liu, H., Motoda H. (1998) Feature Selection for Knowledge Discovery and Data Mining., Kluwer Academic Publishers. Norwell, MA, USA. 1998.

[Piramithu2009-4] Piramuthu, S., Sikora R. T. Iterative feature construction for improving inductive learning algorithms. In Journal of Expert Systems with Applications. Vol. 36 , Iss. 2 (March 2009), pp. 3401-3406, 2009

[bloedorn1998-5] Bloedorn, E., Michalski, R. Data-driven constructive induction: a methodology and its applications. IEEE Intelligent Systems, Special issue on Feature Transformation and Subset Selection, pp. 30-37, March/April, 1998

[breinman1984-6] Breiman, L. Friedman, T., Olshen, R., Stone, C. (1984) Classification and regression trees, Wadsworth

[Sidorova2009-7] Sidorova, J., Badia T. Syntactic learning for ESEDA.1, tool for enhanced speech emotion detection and analysis. Internet Technology and Secured Transactions Conference 2009 (ICITST-2009), London, November 9–12. IEEE

[8] ISBN 978-0-387-84884-6
.

[3]

[4]

[5]

[6]

[7]

[8]