Selection bias

Selection bias is the

statistical analysis

, resulting from the method of collecting samples. If the selection bias is not taken into account, then some conclusions of the study may be false.

Types of bias

Sampling bias

statistical sample of a population (or non-human factors) in which all participants are not equally balanced or objectively represented.^[3] It is mostly classified as a subtype of selection bias,^[4] sometimes specifically termed sample selection bias,^[5]^[6]^[7] but some classify it as a separate type of bias.^[8]

A distinction of sampling bias (albeit not a universally accepted one) is that it undermines the external validity of a test (the ability of its results to be generalized to the rest of the population), while selection bias mainly addresses internal validity for differences or similarities found in the sample at hand. In this sense, errors occurring in the process of gathering the sample or cohort cause sampling bias, while errors in any process thereafter cause selection bias.

Examples of sampling bias include

self-selection, pre-screening of trial participants, discounting trial subjects/tests that did not run to completion and migration bias by excluding subjects who have recently moved into or out of the study area, length-time bias, where slowly developing disease with better prognosis is detected, and lead time bias

, where disease is diagnosed earlier participants than in comparison populations, although the average course of disease is the same.

Time interval

Early termination of a trial at a time when its results support the desired conclusion.

A trial may be terminated early at an extreme value (often for
ethical reasons), but the extreme value is likely to be reached by the variable with the largest variance, even if all variables have a similar mean
.

Exposure

Susceptibility bias
Clinical susceptibility bias, when one disease predisposes for a second disease, and the treatment for the first disease erroneously appears to predispose to the second disease. For example,
postmenopausal syndrome gives a higher likelihood of also developing endometrial cancer, so estrogens given for the postmenopausal syndrome may receive a higher than actual blame for causing endometrial cancer.^[9]

Protopathic bias, when a treatment for the first symptoms of a disease or other outcome appear to cause the outcome. It is a potential bias when there is a lag time from the first symptoms and start of treatment before actual diagnosis.[9] It can be mitigated by lagging, that is, exclusion of exposures that occurred in a certain time period before diagnosis.^[10]

Indication bias, a potential mixup between cause and effect when exposure is dependent on indication, e.g. a treatment is given to people in high risk of acquiring a disease, potentially causing a preponderance of treated people among those acquiring the disease. This may cause an erroneous appearance of the treatment being a cause of the disease.^[11]

Data

Partitioning (dividing) data with knowledge of the contents of the partitions, and then analyzing them with tests designed for blindly chosen partitions.
Post hoc alteration of data inclusion based on arbitrary or subjective reasons, including:
- Cherry picking, which actually is not selection bias, but confirmation bias, when specific subsets of data are chosen to support a conclusion (e.g. citing examples of plane crashes as evidence of airline flight being unsafe, while ignoring the far more common example of flights that complete safely. See: availability heuristic
  )
- Rejection of bad data on (1) arbitrary grounds, instead of according to previously stated or generally agreed criteria or (2) discarding "outliers" on statistical grounds that fail to take into account important information that could be derived from "wild" observations.^[12]

Studies

Selection of which studies to include in a meta-analysis (see also combinatorial meta-analysis).
Performing repeated experiments and reporting only the most favorable results, perhaps relabelling lab records of other experiments as "calibration tests", "instrumentation errors" or "preliminary surveys".
Presenting the most significant result of a data dredge as if it were a single experiment (which is logically the same as the previous item, but is seen as much less dishonest).

Attrition

Attrition bias is a kind of selection bias caused by attrition (loss of participants),

intervention.^[13]

Lost to follow-up, is another form of Attrition bias, mainly occurring in medicinal studies over a lengthy time period. Non-Response or Retention bias can be influenced by a number of both tangible and intangible factors, such as; wealth, education, altruism, initial understanding of the study and its requirements.^[14] Researchers may also be incapable of conducting follow-up contact resulting from inadequate identifying information and contact details collected during the initial recruitment and research phase.^[15]

Observer selection

Philosopher

anthropic reasoning is required.^[16]

An example is the past

existential risks might similarly be underestimated due to selection bias, and an anthropic correction has to be introduced.^[18]

Volunteer bias

Self-selection bias or a volunteer bias in studies offer further threats to the validity of a study as these participants may have intrinsically different characteristics from the target population of the study.[19] Studies have shown that volunteers tend to come from a higher social standing than from a lower socio-economic background.^[20] Furthermore, another study shows that women are more probable to volunteer for studies than males. Volunteer bias is evident throughout the study life-cycle, from recruitment to follow-ups. More generally speaking volunteer response can be put down to individual altruism, a desire for approval, personal relation to the study topic and other reasons.^[20]^[14] As with most instances mitigation in the case of volunteer bias is an increased sample size. ^{[citation needed]}

Mitigation

In the general case, selection biases cannot be overcome with statistical analysis of existing data alone, though

exogenous (background) variables and a treatment indicator. However, in regression models, it is correlation between unobserved determinants of the outcome and unobserved determinants of selection into the sample which bias estimates, and this correlation between unobservables cannot be directly assessed by the observed determinants of treatment.^[21]

When data are selected for fitting or forecast purposes, a coalitional game can be set up so that a fitting or forecast accuracy function can be defined on all subsets of the data variables.

Related issues

Selection bias is closely related to:

publication bias or reporting bias, the distortion produced in community perception or meta-analyses by not publishing uninteresting (usually negative) results, or results which go against the experimenter's prejudices, a sponsor's interests, or community expectations.
confirmation bias, the general tendency of humans to give more attention to whatever confirms our pre-existing perspective; or specifically in experimental science, the distortion produced by experiments that are designed to seek confirmatory evidence instead of trying to disprove the hypothesis.
exclusion bias, results from applying different criteria to cases and controls in regards to participation eligibility for a study/different variables serving as basis for exclusion.

References

^ Dictionary of Cancer Terms → selection bias. Retrieved on September 23, 2009.
^ Medical Dictionary - 'Sampling Bias' Retrieved on September 23, 2009
^ TheFreeDictionary → biased sample. Retrieved on 2009-09-23. Site in turn cites: Mosby's Medical Dictionary, 8th edition.
^ Dictionary of Cancer Terms → Selection Bias. Retrieved on September 23, 2009.
PMID 9504213
.

S2CID 842488
.

doi:10.1016/j.tcs.2013.09.027
.

ISBN 978-0-7817-8257-9
.

^
PMID 698947
.

S2CID 25648490
.

ISBN 978-1-930513-58-7
.

doi:10.1080/00401706.1960.10489875
.

^
PMID 15649954
.

^
PMID 23874465
.

S2CID 27683727
.

ISBN 978-0-415-93858-7
.

S2CID 6485564
.

S2CID 4390013
.

PMID 20407272
.

^ ^a ^b "Volunteer bias". Catalog of Bias. 2017-11-17. Retrieved 2020-10-29.

JSTOR 1912352
.

v
t
e
Biases
Cognitive biases

Acquiescence

Ambiguity

Anchoring

Attentional

Attribution
Actor–observer

Correspondence

Authority

Automation

Availability
Mean world

Belief

Blind spot

Choice-supportive

Commitment

Confirmation

Compassion fade

Congruence

Cultural

Distinction

Dunning–Kruger

Egocentric
Curse of knowledge

Emotional

Extrinsic incentives

Fading affect

Framing

Frequency

Frog pond effect

Halo effect

Hindsight

Horn effect

Hostile attribution

Impact

Implicit

In-group

Illusion of transparency

Mean world syndrome

Mere-exposure effect

Negativity

Normalcy

Omission

Optimism

Out-group homogeneity

Outcome

Overton window

Precision

Present

Pro-innovation

Response

Restraint

Self-serving

Social comparison

Social influence bias

Spotlight

Status quo

Substitution

Time-saving

Trait ascription

Turkey illusion

von Restorff effect

Zero-risk

In animals

Statistical biases

Estimator

Forecast

Healthy user

Information
Psychological

Lead time

Length time

Non-response

Observer

Omitted-variable

Participation

Recall

Sampling

Selection

Self-selection

Social desirability

Spectrum

Survivorship

Systematic error

Systemic

Verification

Wet

Other biases

Academic

Basking in reflected glory

Funding

FUTON

Inductive

Infrastructure

Inherent

In education

Liking gap

Media
False balance

Vietnam War

Norway

South Asia

Sweden

United States

Arab–Israeli conflict

Ukraine

Net

Political bias

Publication

Reporting

White hat

Bias reduction

Cognitive bias mitigation

Debiasing

Heuristics in judgment and decision-making

Lists: General

Memory

v
t
e
Clinical research and experimental design
Overview

Clinical trial
Trial protocols

Adaptive clinical trial

Academic clinical trials

Clinical study design

Evidence-based medicine

Real world evidence

Patient and public involvement

Controlled study
(EBM I to II-1)

Randomized controlled trial
Scientific experiment

Blind experiment

Open-label trial

Adaptive clinical trial

Platform trial

Observational study
(EBM II-2 to II-3)

Cross-sectional study vs. Longitudinal study, Ecological study

Cohort study
Retrospective

Prospective

Case–control study (Nested case–control study)

Case series

Case study

Case report

Specificity and sensitivity, Likelihood-ratios, Pre- and post-test probability
Trial/test types

In vitro

In vivo

Animal testing

Animal testing on non-human primates

First-in-man study

Multicenter trial

Seeding trial

Vaccine trial

Analysis of clinical trials

Risk–benefit ratio

Systematic review

Replication

Meta-analysis

Intention-to-treat analysis

Interpretation of results

Selection bias

Survivorship bias

Correlation does not imply causation

Null result

Sex as a biological variable

Category

Glossary

List of topics

Retrieved from "https://en.wikipedia.org/w/index.php?title=Selection_bias&oldid=1219946110"

[1] Dictionary of Cancer Terms → selection bias. Retrieved on September 23, 2009.

[2] Medical Dictionary - 'Sampling Bias' Retrieved on September 23, 2009

[3] TheFreeDictionary → biased sample. Retrieved on 2009-09-23. Site in turn cites: Mosby's Medical Dictionary, 8th edition.

[4] Dictionary of Cancer Terms → Selection Bias. Retrieved on September 23, 2009.

[ArdsChung1998-5] PMID 9504213
.

[CortesMohri2008-6] S2CID 842488
.

[CortesMohri2014-7] :10.1016/j.tcs.2013.09.027
.

[Fadem2009-8] ISBN 978-0-7817-8257-9
.

[fenstein-9] 
PMID 698947
.

[10] S2CID 25648490
.

[11] ISBN 978-1-930513-58-7
.

[Kruskal1960-12] :10.1080/00401706.1960.10489875
.

[Juni-13] 
PMID 15649954
.

[Jordan-14] 
PMID 23874465
.

[15] S2CID 27683727
.

[16] ISBN 978-0-415-93858-7
.

[17] S2CID 6485564
.

[18] S2CID 4390013
.

[19] PMID 20407272
.

[:0-20] "Volunteer bias". Catalog of Bias. 2017-11-17. Retrieved 2020-10-29.

[21] JSTOR 1912352
.

[3]

[4]

[5]

[6]

[7]

[8]

[12]

[13]

[14]

[15]

[16]

[18]

[20]

[21]