Abess

abess (Adaptive Best Subset Selection, also ABESS) is a

best subset selection. It aims to determine which features or variables are crucial for optimal model performance when provided with a dataset and a prediction task. abess was introduced by Zhu in 2020 ^[1]

and it dynamically selects the appropriate model size adaptively, eliminating the need for selecting regularization parameters.

abess is applicable in various statistical and machine learning tasks, including linear regression, the Single-index model, and other common predictive models.^[1]^[2] abess can also be applied in biostatistics.^[3]^[4]^[5]^[6]

Basic Form

The basic form of abess ^[1] is employed to address the optimal subset selection problem in general linear regression. abess is an $l_{0}$ method, it is characterized by its polynomial

unbiased and consistent

estimates.

In the context of linear regression, assuming we have knowledge of $n$ independent samples $(x_{i},y_{i}),i=1,\ldots ,n$ , where $x_{i}\in \mathbb {R} ^{p\times 1}$ and $y_{i}\in \mathbb {R}$ , we define $X=(x_{1},\ldots ,x_{n})^{\top }$ and $y=(y_{1},\ldots ,y_{n})^{\top }$ . The following equation represents the general linear regression model:

$y=X\beta +\varepsilon .$

To obtain appropriate parameters $\beta$ , one can consider the loss function for linear regression:

${\mathcal {L}}_{n}^{\text{LR}}(\beta ;X,y)={\frac {1}{2n}}\|y-X\beta \|_{2}^{2}.$

In abess, the initial focus is on optimizing the loss function under the $l_{0}$ constraint. That is, we consider the following problem:

$\min _{\beta \in \mathbb {R} ^{p\times 1}}{\mathcal {L}}_{n}^{\text{LR}}(\beta ;X,y),{\text{ subject to }}\|\beta \|_{0}\leq s,$

where $s$ represents the desired size of the support set, and $\|\beta \|_{0}=\sum _{i=1}^{p}{\mathcal {I}}_{(\beta _{i}\neq 0)}$ is the $l_{0}$ norm of the vector.

To address the optimization problem described above, abess iteratively exchanges an equal number of variables between the active set and the inactive set. In each iteration, the concept of sacrifice is introduced as follows:

For j in the active set ( $j\in {\hat {\mathcal {A}}}$ ):

$\xi _{j}={\mathcal {L}}_{n}^{\text{LR}}\left({\hat {\boldsymbol {\beta }}}^{{\mathcal {A}}\backslash \{j\}}\right)-{\mathcal {L}}_{n}^{\text{LR}}\left({\hat {\boldsymbol {\beta }}}^{\mathcal {A}}\right)={\frac {{\boldsymbol {X}}_{j}^{\top }{\boldsymbol {X}}_{j}}{2n}}\left({\hat {\beta }}_{j}\right)^{2}$

For j in the inactive set ( $j\notin {\hat {\mathcal {A}}}$ ):

$\xi _{j}={\mathcal {L}}_{n}^{\text{LR}}\left({\hat {\boldsymbol {\beta }}}^{\mathcal {A}}\right)-{\mathcal {L}}_{n}^{\text{LR}}\left({\hat {\boldsymbol {\beta }}}^{\mathcal {A}}+{\hat {\boldsymbol {t}}}^{\{j\}}\right)={\frac {{\boldsymbol {X}}_{j}^{\top }{\boldsymbol {X}}_{j}}{2n}}\left({\frac {{\hat {\mathrm {d} }}_{j}}{{\boldsymbol {X}}_{j}^{\top }{\boldsymbol {X}}_{j}/n}}\right)^{2}$

Here are the key elements in the above equations:

${\hat {\beta }}^{\mathcal {A}}$ : This represents the estimate of $\beta$ obtained in the previous iteration.
${\hat {\mathcal {A}}}$ : It denotes the estimated active set from the previous iteration.
${\hat {\boldsymbol {\beta }}}^{{\mathcal {A}}\backslash \{j\}}$ : This is a vector where the j-th element is set to 0, while the other elements are the same as ${\hat {\beta }}^{\mathcal {A}}$ .
${\hat {\boldsymbol {t}}}^{\{j\}}=\arg \min _{t}{\mathcal {L}}_{n}^{\text{LR}}\left({\hat {\boldsymbol {\beta }}}^{\mathcal {A}}+{\boldsymbol {t}}^{\{j\}}\right)$ : Here, $t^{\{j\}}$ represents a vector where all elements are 0 except the j-th element.
${\hat {d}}_{j}={\boldsymbol {X}}_{j}^{\top }({\boldsymbol {y}}-{\boldsymbol {X}}{\hat {\boldsymbol {\beta }}})/n$ : This is calculated based on the equation mentioned.

The iterative process involves exchanging variables, with the aim of minimizing the sacrifices in the active set while maximizing the sacrifices in the inactive set during each iteration. This approach allows abess to efficiently search for the optimal feature subset.

In abess, select an appropriate $s_{\max }$ and optimize the above problem for active sets size $s=1,\ldots ,s_{\max }$ using the

information criterion

{\text{GIC}}=n\log {\mathcal {L}}_{n}^{\text{LR}}+s\log p\log \log n,

to adaptively choose the appropriate active set size

s

and obtain its corresponding abess estimator.

Generalizations

The splicing algorithm in abess can be employed for subset selection in other models.

Distribution-Free Location-Scale Regression

In 2023, Siegfried extends abess to the case of Distribution-Free and Location-Scale.^[7] Specifically, it considers the optimization problem

$\max _{{\boldsymbol {\vartheta }}\in \mathbb {R} ^{P},{\boldsymbol {\beta }}\in \mathbb {R} ^{J},{\boldsymbol {\gamma }}\in \mathbb {R} ^{J}}\sum _{i=1}^{N}\ell _{i}\left({\boldsymbol {\vartheta }},{\boldsymbol {x}}_{i}^{\top }{\boldsymbol {\beta }},{\sqrt {\exp \left({\boldsymbol {x}}_{i}^{\top }{\boldsymbol {\gamma }}\right)}}^{-1}\right),$

subject to $\left\|\left({\boldsymbol {\beta }}^{\top },{\boldsymbol {\gamma }}^{\top }\right)^{\top }\right\|_{0}\leq s,$ where $\ell _{i}$ is a loss function, ${\boldsymbol {\vartheta }}$ is a parameter vector, ${\boldsymbol {\beta }}$ and ${\boldsymbol {\gamma }}$ are vectors, and ${\boldsymbol {x}}_{i}$ is a data vector.

This approach, demonstrated across various applications, enables parsimonious regression modeling for arbitrary outcomes while maintaining interpretability through innovative subset selection procedures.

Groups Selection

In 2023, Zhang applied the splicing algorithm to group selection,^[8] optimizing the following model:

$\min _{{\boldsymbol {\beta }}\in \mathbb {R} ^{p}}{\mathcal {L}}_{n}^{\text{LR}}(\beta ;X,y){\text{ subject to }}\sum _{j=1}^{J}I\left(\|{\boldsymbol {\beta }}_{G_{j}}\|_{2}\neq 0\right)\leq s$

Here are the symbols involved:

$J$ : Total number of feature groups, representing the existence of $J$ non-overlapping feature groups in the dataset.
$G_{j}$ : Index set for the $j$ -th feature group, where $j$ ranges from 1 to $J$ , representing the feature grouping structure in the data.
$s$ : Model size, a positive integer determined from the data, limiting the number of selected feature groups.

Regression with Corrupted Data

Zhang applied the splicing algorithm to handle corrupted data.^[9] Corrupted data refers to information that has been disrupted or contains errors during the data collection or recording process. This interference may include sensor inaccuracies, recording errors, communication issues, or other external disturbances, leading to inaccurate or distorted observations within the dataset.

Single Index Models

In 2023, Tang applied the splicing algorithm to optimal subset selection in the Single-index model.^[2]

The form of the Single Index Model (SIM) is given by $y_{i}=g({\boldsymbol {b}}^{\top }{\boldsymbol {x}}_{i},e_{i}),\quad i=1,\ldots ,n,$

where ${\boldsymbol {b}}$ is the parameter vector, $e_{i}$ is the error term.

The corresponding loss function is defined as $l_{n}({\boldsymbol {\beta }})=\sum _{i=1}^{n}\left({\frac {r_{i}}{n}}-{\frac {1}{2}}-{\boldsymbol {x}}_{i}^{\top }{\boldsymbol {\beta }}\right)^{2},$

where ${\boldsymbol {r}}$ is the rank vector, $r_{i}$ is the rank of $y_{i}$ in ${\boldsymbol {y}}$ .

The Estimation Problem addressed by this algorithm is $\min _{\boldsymbol {\beta }}l_{n}({\boldsymbol {\beta }}),{\text{ s.t. }}\|{\boldsymbol {\beta }}\|_{0}\leq s.$

Eographically Weighted Regression Model

In 2023, Wu

geographically weighted regression

(GWR). GWR is a spatial analysis method, and Wu's research focuses on improving GWR performance in handling geographical data regression modeling. This is achieved through the application of an l0-norm variable adaptive selection method, which simultaneously performs model selection and coefficient optimization, enhancing the accuracy of regression modeling for geographic spatial data.

Distributed Systems

In 2023, Chen

distributed systems

, proposing an efficient algorithm for abess.

A distributed system is a computational model that distributes computing tasks across multiple independent nodes to achieve more efficient, reliable, and scalable data processing. In a distributed system, individual computing nodes can work simultaneously, collaboratively completing the overall tasks, thereby enhancing system performance and processing capabilities.

However, within distributed systems, there is a lack of efficient algorithms for optimal subset selection. To address this gap, Chen introduces a novel communication-efficient approach for handling optimal subset selection in distributed systems.

Software Package

The abess library.^[12] (version 0.4.5) is an R package and python package based on C++ algorithms. It is open-source on GitHub. The library can be used for optimal subset selection in linear regression, (multi-)classification, and censored-response modeling models. The abess package allows for parameters to be chosen in a grouped format. Information and tutorials are available on the abess homepage.^[13]

Application

abess can be applied in biostatistics, such as assessing the robust severity of

antibiotic resistance in Mycobacterium tuberculosis,^[4] exploring prognostic factors in neck pain,^[5] and developing prediction models for severe pain in patients after percutaneous nephrolithotomy.^[6] abess can also be applied to gene selection.^[14] In the field of data-driven partial differential equation (PDE) discovery, Thanasutives ^[15]

applied abess to automatically identify parsimonious governing PDEs.

References

^
PMID 33328272
.

^
arXiv:2309.06230 [stat.ML
].

^
doi:10.1002/imt2.126.{{cite journal}}: CS1 maint: multiple names: authors list (link
)

^ ^a ^b Reshetnikov, KO and Bykova, DI and Kuleshov, KV and Chukreev, K and Guguchkin, EP and Akimkin, VG and Neverov, AD and Fedonin, GG (2022). "Feature selection and aggregation for antibiotic resistance GWAS in Mycobacterium tuberculosis: a comparative study". bioRxiv. Cold Spring Harbor Laboratory: 2022–03.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^
PMID 37834877.{{cite journal}}: CS1 maint: multiple names: authors list (link
)

^
doi:10.21203/rs.3.rs-2388045/v1.{{cite journal}}: CS1 maint: multiple names: authors list (link
)

doi:10.1080/00031305.2023.2203177
.

doi:10.1287/ijoc.2022.1241
.

S2CID 249106625
.

S2CID 257321841
.

S2CID 264147329
.

^ Zhu, Jin; Wang, Xueqin; Hu, Liyuan; Huang, Junhao; Jiang, Kangkang; Zhang, Yanhang; Lin, Shiyun; Zhu, Junxian (2022). "abess: a fast best-subset selection library in python and R" (PDF). The Journal of Machine Learning Research. 23 (1). JMLRORG: 9206–9212.

^ "ABESS 0.4.5 documentation".

PMID 35049823
.

ISSN 2632-2153
.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Abess&oldid=1220646434"

[zhu2020-1] 
PMID 33328272
.

[tang2023-2] 
arXiv:2309.06230 [stat.ML
].

[kong2023-3] 
doi:10.1002/imt2.126.{{cite journal}}: CS1 maint: multiple names: authors list (link
)

[reshetnikov2022-4] Reshetnikov, KO and Bykova, DI and Kuleshov, KV and Chukreev, K and Guguchkin, EP and Akimkin, VG and Neverov, AD and Fedonin, GG (2022). "Feature selection and aggregation for antibiotic resistance GWAS in Mycobacterium tuberculosis: a comparative study". bioRxiv. Cold Spring Harbor Laboratory: 2022–03.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[liew2023-5] 
PMID 37834877.{{cite journal}}: CS1 maint: multiple names: authors list (link
)

[wei2022-6] 
doi:10.21203/rs.3.rs-2388045/v1.{{cite journal}}: CS1 maint: multiple names: authors list (link
)

[siegfried2023-7] :10.1080/00031305.2023.2203177
.

[zhang2023-8] :10.1287/ijoc.2022.1241
.

[zhang2024-9] S2CID 249106625
.

[wu2023-10] S2CID 257321841
.

[chen2023-11] S2CID 264147329
.

[zhu2022-12] Zhu, Jin; Wang, Xueqin; Hu, Liyuan; Huang, Junhao; Jiang, Kangkang; Zhang, Yanhang; Lin, Shiyun; Zhu, Junxian (2022). "abess: a fast best-subset selection library in python and R" (PDF). The Journal of Machine Learning Research. 23 (1). JMLRORG: 9206–9212.

[abess-13] "ABESS 0.4.5 documentation".

[miao2022-14] PMID 35049823
.

[15] ISSN 2632-2153
.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[12]

[13]

[14]

[15]