Meta-analysis
Part of a series on |
Research |
---|
Philosophy portal |
Meta-analysis is the statistical combination of the results of multiple studies addressing a similar research question. An important part of this method involves computing an effect size across all of the studies; this involves extracting effect sizes and variance measures from various studies. Meta-analyses are integral in supporting research grant proposals, shaping treatment guidelines, and influencing health policies. They are also pivotal in summarizing existing research to guide future studies, thereby cementing their role as a fundamental methodology in metascience. Meta-analyses are often, but not always, important components of a systematic review procedure. For instance, a meta-analysis may be conducted on several clinical trials of a medical treatment, in an effort to obtain a better understanding of how well the treatment works.
History
The term "meta-analysis" was coined in 1976 by the statistician
The first model meta-analysis was published in 1978 on the effectiveness of psychotherapy outcomes by Mary Lee Smith and Gene Glass.[2][11] After publication of their article there was pushback on the usefulness and validity of meta-analysis as a tool for evidence synthesis. The first example of this was by Han Eysenck who in an 1978 article in response to the work done by Mary Lee Smith and Gene Glass called meta-analysis an "exercise in mega-silliness".[12][13] Later Eysenck would refer to meta-analysis as "statistical alchemy".[14] Despite these criticisms the use of meta-analysis has only grown since its modern introduction. By 1991 there were 334 published meta-analyses;[13] this number grew to 9,135 by 2014.[1][15]
The field of meta-analysis expanded greatly since the 1970s and touches multiple disciplines including psychology, medicine, and ecology.[1] Further the more recent creation of evidence synthesis communities has increased the cross pollination of ideas, methods, and the creation of software tools across disciplines.[16][17][18]
Steps in a meta-analysis
A meta-analysis is usually preceded by a systematic review, as this allows identification and critical appraisal of all the relevant evidence (thereby limiting the risk of bias in summary estimates). The general steps are then as follows:[19]
- Formulation of the research question, e.g. using the PICO model (Population, Intervention, Comparison, Outcome).
- Search of literature
- Selection of studies ('incorporation criteria')
- Based on quality criteria, e.g. the requirement of randomization and blinding in a clinical trial
- Selection of specific studies on a well-specified subject, e.g. the treatment of breast cancer.
- Decide whether unpublished studies are included to avoid publication bias (file drawer problem)
- Decide which dependent variables or summary measures are allowed. For instance, when considering a meta-analysis of published (aggregate) data:
- Differences (discrete data)
- Means (continuous data)
- Selection of a meta-analysis model, e.g. fixed effect or random effects meta-analysis.
- Examine sources of between-study heterogeneity, e.g. using subgroup analysis or meta-regression.
Formal guidance for the conduct and reporting of meta-analyses is provided by the Cochrane Handbook.
For reporting guidelines, see the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement.[20]
Literature Search
One of the most important steps of a meta-analysis is data collection. For an efficient database search, appropriate keywords and search limits need to be identified.[21] The use of Boolean operators and search limits can assist the literature search.[22][23] A number of databases are available (e.g., PubMed, Embase, PsychInfo), however, it is up to the researcher to choose the most appropriate sources for their research area.[24] Indeed, many scientists use duplicate search terms within two or more databases to cover multiple sources. The reference lists of eligible studies can also be searched for eligible studies (i.e., snowballing). The initial search may return a large volume of studies. Quite often, the abstract or the title of the manuscript reveals that the study is not eligible for inclusion, based on the pre-specified criteria. These studies can be discarded. However, if it appears that the study may be eligible (or even if there is some doubt) the full paper can be retained for closer inspection. The references lists of eligible articles can also be searched for any relevant articles. These search results need to be detailed in a PRIMSA flow diagram[25] which details the flow of information through all stages of the review. Thus, it is important to note how many studies were returned after using the specified search terms and how many of these studies were discarded, and for what reason.[24] The search terms and strategy should be specific enough for a reader to reproduce the search. The date range of studies, along with the date (or date period) the search was conducted should also be provided.[26]
A data collection form provides a standardized means of collecting data from eligible studies. For a meta-analysis of correlational data, effect size information is usually collected as Pearson's r statistic. Partial correlations are often reported in research, however, these may inflate relationships in comparison to zero-order correlations.[27] Moreover, the partialed out variables will likely vary from study-to-study. As a consequence, many meta-analyses exclude partial correlations from their analysis.[24] As a final resort, plot digitizers can be used to scrape data points from scatterplots (if available) for the calculation of Pearson's r.[28][29] Data reporting important study characteristics that may moderate effects, such as the mean age of participants, should also be collected.[30] A measure of study quality can also be included in these forms to assess the quality of evidence from each study.[31] There are more than 80 tools available to assess the quality and risk of bias in observational studies reflecting the diversity of research approaches between fields.[31][32][33] These tools usually include an assessment of how dependent variables were measured, appropriate selection of participants, and appropriate control for confounding factors. Other quality measures that may be more relevant for correlational studies include sample size, psychometric properties, and reporting of methods.[24]
A final consideration is whether to include studies from the gray literature, which is defined as research that has not been formally published.[34] This type of literature includes conference abstracts,[35] dissertations,[36] and pre-prints.[37] While the inclusion of gray literature reduces the risk of publication bias, the methodological quality of the work is often (but not always) lower than formally published work.[38][39] Reports from conference proceedings, which are the most common source of gray literature,[40] are poorly reported[41] and data in the subsequent publication is often inconsistent, with differences observed in almost 20% of published studies.[42]
Methods and assumptions
Approaches
In general, two types of evidence can be distinguished when performing a meta-analysis: individual participant data (IPD), and aggregate data (AD). The aggregate data can be direct or indirect.
AD is more commonly available (e.g. from the literature) and typically represents summary estimates such as odds ratios or relative risks. This can be directly synthesized across conceptually similar studies using several approaches (see below). On the other hand, indirect aggregate data measures the effect of two treatments that were each compared against a similar control group in a meta-analysis. For example, if treatment A and treatment B were directly compared vs placebo in separate meta-analyses, we can use these two pooled results to get an estimate of the effects of A vs B in an indirect comparison as effect A vs Placebo minus effect B vs Placebo.
IPD evidence represents raw data as collected by the study centers. This distinction has raised the need for different meta-analytic methods when evidence synthesis is desired, and has led to the development of one-stage and two-stage methods.[43] In one-stage methods the IPD from all studies are modeled simultaneously whilst accounting for the clustering of participants within studies. Two-stage methods first compute summary statistics for AD from each study and then calculate overall statistics as a weighted average of the study statistics. By reducing IPD to AD, two-stage methods can also be applied when IPD is available; this makes them an appealing choice when performing a meta-analysis. Although it is conventionally believed that one-stage and two-stage methods yield similar results, recent studies have shown that they may occasionally lead to different conclusions.[44][45]
Statistical models for aggregate data
Direct evidence: Models incorporating study effects only
Fixed effect model
The fixed effect model provides a weighted average of a series of study estimates. The inverse of the estimates' variance is commonly used as study weight, so that larger studies tend to contribute more than smaller studies to the weighted average. Consequently, when studies within a meta-analysis are dominated by a very large study, the findings from smaller studies are practically ignored.[46] Most importantly, the fixed effects model assumes that all included studies investigate the same population, use the same variable and outcome definitions, etc. This assumption is typically unrealistic as research is often prone to several sources of heterogeneity.[47]
Random effects model
A common model used to synthesize heterogeneous research is the random effects model of meta-analysis. This is simply the weighted average of the effect sizes of a group of studies. The weight that is applied in this process of weighted averaging with a random effects meta-analysis is achieved in two steps:[48]
- Step 1: Inverse variance weighting
- Step 2: Un-weighting of this inverse variance weighting by applying a random effects variance component (REVC) that is simply derived from the extent of variability of the effect sizes of the underlying studies.
This means that the greater this variability in effect sizes (otherwise known as heterogeneity), the greater the un-weighting and this can reach a point when the random effects meta-analysis result becomes simply the un-weighted average effect size across the studies. At the other extreme, when all effect sizes are similar (or variability does not exceed sampling error), no REVC is applied and the random effects meta-analysis defaults to simply a fixed effect meta-analysis (only inverse variance weighting).
The extent of this reversal is solely dependent on two factors:[49]
- Heterogeneity of precision
- Heterogeneity of effect size
Since neither of these factors automatically indicates a faulty larger study or more reliable smaller studies, the re-distribution of weights under this model will not bear a relationship to what these studies actually might offer. Indeed, it has been demonstrated that redistribution of weights is simply in one direction from larger to smaller studies as heterogeneity increases until eventually all studies have equal weight and no more redistribution is possible.[49] Another issue with the random effects model is that the most commonly used confidence intervals generally do not retain their coverage probability above the specified nominal level and thus substantially underestimate the statistical error and are potentially overconfident in their conclusions.[50][51] Several fixes have been suggested[52][53] but the debate continues on.[51][54] A further concern is that the average treatment effect can sometimes be even less conservative compared to the fixed effect model[55] and therefore misleading in practice. One interpretational fix that has been suggested is to create a prediction interval around the random effects estimate to portray the range of possible effects in practice.[56] However, an assumption behind the calculation of such a prediction interval is that trials are considered more or less homogeneous entities and that included patient populations and comparator treatments should be considered exchangeable[57] and this is usually unattainable in practice.
There are many methods used to estimate between studies variance with restricted maximum likelihood estimator being the least prone to bias and one of the most commonly used.[58] Several advanced iterative techniques for computing the between studies variance exist including both maximum likelihood and restricted maximum likelihood methods and random effects models using these methods can be run with multiple software platforms including Excel,[59] Stata,[60] SPSS,[61] and R.[62]
Most meta-analyses include between 2 and 4 studies and such a sample is more often than not inadequate to accurately estimate heterogeneity. Thus it appears that in small meta-analyses, an incorrect zero between study variance estimate is obtained, leading to a false homogeneity assumption. Overall, it appears that heterogeneity is being consistently underestimated in meta-analyses and sensitivity analyses in which high heterogeneity levels are assumed could be informative.[63] These random effects models and software packages mentioned above relate to study-aggregate meta-analyses and researchers wishing to conduct individual patient data (IPD) meta-analyses need to consider mixed-effects modelling approaches.[64]/
IVhet model
Doi & Barendregt working in collaboration with Khan, Thalib and Williams (from the University of Queensland, University of Southern Queensland and Kuwait University), have created an inverse variance quasi likelihood based alternative (IVhet) to the random effects (RE) model for which details are available online.[59] This was incorporated into MetaXL version 2.0,[65] a free Microsoft Excel add-in for meta-analysis produced by Epigear International Pty Ltd, and made available on 5 April 2014. The authors state that a clear advantage of this model is that it resolves the two main problems of the random effects model. The first advantage of the IVhet model is that coverage remains at the nominal (usually 95%) level for the confidence interval unlike the random effects model which drops in coverage with increasing heterogeneity.[50][51] The second advantage is that the IVhet model maintains the inverse variance weights of individual studies, unlike the RE model which gives small studies more weight (and therefore larger studies less) with increasing heterogeneity. When heterogeneity becomes large, the individual study weights under the RE model become equal and thus the RE model returns an arithmetic mean rather than a weighted average. This side-effect of the RE model does not occur with the IVhet model which thus differs from the RE model estimate in two perspectives:[59] Pooled estimates will favor larger trials (as opposed to penalizing larger trials in the RE model) and will have a confidence interval that remains within the nominal coverage under uncertainty (heterogeneity). Doi & Barendregt suggest that while the RE model provides an alternative method of pooling the study data, their simulation results[66] demonstrate that using a more specified probability model with untenable assumptions, as with the RE model, does not necessarily provide better results. The latter study also reports that the IVhet model resolves the problems related to underestimation of the statistical error, poor coverage of the confidence interval and increased MSE seen with the random effects model and the authors conclude that researchers should henceforth abandon use of the random effects model in meta-analysis. While their data is compelling, the ramifications (in terms of the magnitude of spuriously positive results within the Cochrane database) are huge and thus accepting this conclusion requires careful independent confirmation. The availability of a free software (MetaXL)[65] that runs the IVhet model (and all other models for comparison) facilitates this for the research community.
Direct evidence: Models incorporating additional information
Quality effects model
Doi and Thalib originally introduced the quality effects model.[67] They[68] introduced a new approach to adjustment for inter-study variability by incorporating the contribution of variance due to a relevant component (quality) in addition to the contribution of variance due to random error that is used in any fixed effects meta-analysis model to generate weights for each study. The strength of the quality effects meta-analysis is that it allows available methodological evidence to be used over subjective random effects, and thereby helps to close the damaging gap which has opened up between methodology and statistics in clinical research. To do this a synthetic bias variance is computed based on quality information to adjust inverse variance weights and the quality adjusted weight of the ith study is introduced.[67] These adjusted weights are then used in meta-analysis. In other words, if study i is of good quality and other studies are of poor quality, a proportion of their quality adjusted weights is mathematically redistributed to study i giving it more weight towards the overall effect size. As studies become increasingly similar in terms of quality, re-distribution becomes progressively less and ceases when all studies are of equal quality (in the case of equal quality, the quality effects model defaults to the IVhet model – see previous section). A recent evaluation of the quality effects model (with some updates) demonstrates that despite the subjectivity of quality assessment, the performance (MSE and true variance under simulation) is superior to that achievable with the random effects model.[69][70] This model thus replaces the untenable interpretations that abound in the literature and a software is available to explore this method further.[65]
Indirect evidence: Network meta-analysis methods
Indirect comparison meta-analysis methods (also called network meta-analyses, in particular when multiple treatments are assessed simultaneously) generally use two main methodologies. First, is the Bucher method[71] which is a single or repeated comparison of a closed loop of three-treatments such that one of them is common to the two studies and forms the node where the loop begins and ends. Therefore, multiple two-by-two comparisons (3-treatment loops) are needed to compare multiple treatments. This methodology requires that trials with more than two arms have two arms only selected as independent pair-wise comparisons are required. The alternative methodology uses complex statistical modelling to include the multiple arm trials and comparisons simultaneously between all competing treatments. These have been executed using Bayesian methods, mixed linear models and meta-regression approaches.[citation needed]
Bayesian framework
Specifying a Bayesian network meta-analysis model involves writing a directed acyclic graph (DAG) model for general-purpose Markov chain Monte Carlo (MCMC) software such as WinBUGS.[72] In addition, prior distributions have to be specified for a number of the parameters, and the data have to be supplied in a specific format.[72] Together, the DAG, priors, and data form a Bayesian hierarchical model. To complicate matters further, because of the nature of MCMC estimation, overdispersed starting values have to be chosen for a number of independent chains so that convergence can be assessed.[73] Recently, multiple R software packages were developed to simplify the model fitting (e.g., metaBMA[74] and RoBMA[75]) and even implemented in statistical software with graphical user interface (GUI): JASP. Although the complexity of the Bayesian approach limits usage of this methodology, recent tutorial papers are trying to increase accessibility of the methods.[76][77] Methodology for automation of this method has been suggested[72] but requires that arm-level outcome data are available, and this is usually unavailable. Great claims are sometimes made for the inherent ability of the Bayesian framework to handle network meta-analysis and its greater flexibility. However, this choice of implementation of framework for inference, Bayesian or frequentist, may be less important than other choices regarding the modeling of effects[78] (see discussion on models above).
Frequentist multivariate framework
On the other hand, the frequentist multivariate methods involve approximations and assumptions that are not stated explicitly or verified when the methods are applied (see discussion on meta-analysis models above). For example, the mvmeta package for Stata enables network meta-analysis in a frequentist framework.[79] However, if there is no common comparator in the network, then this has to be handled by augmenting the dataset with fictional arms with high variance, which is not very objective and requires a decision as to what constitutes a sufficiently high variance.[72] The other issue is use of the random effects model in both this frequentist framework and the Bayesian framework. Senn advises analysts to be cautious about interpreting the 'random effects' analysis since only one random effect is allowed for but one could envisage many.[78] Senn goes on to say that it is rather naıve, even in the case where only two treatments are being compared to assume that random-effects analysis accounts for all uncertainty about the way effects can vary from trial to trial. Newer models of meta-analysis such as those discussed above would certainly help alleviate this situation and have been implemented in the next framework.
Generalized pairwise modelling framework
An approach that has been tried since the late 1990s is the implementation of the multiple three-treatment closed-loop analysis. This has not been popular because the process rapidly becomes overwhelming as network complexity increases. Development in this area was then abandoned in favor of the Bayesian and multivariate frequentist methods which emerged as alternatives. Very recently, automation of the three-treatment closed loop method has been developed for complex networks by some researchers[59] as a way to make this methodology available to the mainstream research community. This proposal does restrict each trial to two interventions, but also introduces a workaround for multiple arm trials: a different fixed control node can be selected in different runs. It also utilizes robust meta-analysis methods so that many of the problems highlighted above are avoided. Further research around this framework is required to determine if this is indeed superior to the Bayesian or multivariate frequentist frameworks. Researchers willing to try this out have access to this framework through a free software.[65]
Tailored meta-analysis
Another form of additional information comes from the intended setting. If the target setting for applying the meta-analysis results is known then it may be possible to use data from the setting to tailor the results thus producing a 'tailored meta-analysis'.,
Aggregating IPD and AD
Meta-analysis can also be applied to combine IPD and AD. This is convenient when the researchers who conduct the analysis have their own raw data while collecting aggregate or summary data from the literature. The generalized integration model (GIM)[82] is a generalization of the meta-analysis. It allows that the model fitted on the individual participant data (IPD) is different from the ones used to compute the aggregate data (AD). GIM can be viewed as a model calibration method for integrating information with more flexibility.
Validation of meta-analysis results
The meta-analysis estimate represents a weighted average across studies and when there is heterogeneity this may result in the summary estimate not being representative of individual studies. Qualitative appraisal of the primary studies using established tools can uncover potential biases,[83][84] but does not quantify the aggregate effect of these biases on the summary estimate. Although the meta-analysis result could be compared with an independent prospective primary study, such external validation is often impractical. This has led to the development of methods that exploit a form of leave-one-out cross validation, sometimes referred to as internal-external cross validation (IOCV).[85] Here each of the k included studies in turn is omitted and compared with the summary estimate derived from aggregating the remaining k- 1 studies. A general validation statistic, Vn based on IOCV has been developed to measure the statistical validity of meta-analysis results.[86] For test accuracy and prediction, particularly when there are multivariate effects, other approaches which seek to estimate the prediction error have also been proposed.[87]
Challenges
A meta-analysis of several small studies does not always predict the results of a single large study.[88] Some have argued that a weakness of the method is that sources of bias are not controlled by the method: a good meta-analysis cannot correct for poor design or bias in the original studies.[89] This would mean that only methodologically sound studies should be included in a meta-analysis, a practice called 'best evidence synthesis'.[89] Other meta-analysts would include weaker studies, and add a study-level predictor variable that reflects the methodological quality of the studies to examine the effect of study quality on the effect size.[90] However, others have argued that a better approach is to preserve information about the variance in the study sample, casting as wide a net as possible, and that methodological selection criteria introduce unwanted subjectivity, defeating the purpose of the approach.[91]
Publication bias: the file drawer problem
Another potential pitfall is the reliance on the available body of published studies, which may create exaggerated outcomes due to
This
The distribution of effect sizes can be visualized with a funnel plot which (in its most common version) is a scatter plot of standard error versus the effect size.[95] It makes use of the fact that the smaller studies (thus larger standard errors) have more scatter of the magnitude of effect (being less precise) while the larger studies have less scatter and form the tip of the funnel. If many negative studies were not published, the remaining positive studies give rise to a funnel plot in which the base is skewed to one side (asymmetry of the funnel plot). In contrast, when there is no publication bias, the effect of the smaller studies has no reason to be skewed to one side and so a symmetric funnel plot results. This also means that if no publication bias is present, there would be no relationship between standard error and effect size.[96] A negative or positive relation between standard error and effect size would imply that smaller studies that found effects in one direction only were more likely to be published and/or to be submitted for publication.
Apart from the visual funnel plot, statistical methods for detecting publication bias have also been proposed.[97] These are controversial because they typically have low power for detection of bias, but also may make false positives under some circumstances.[98] For instance small study effects (biased smaller studies), wherein methodological differences between smaller and larger studies exist, may cause asymmetry in effect sizes that resembles publication bias. However, small study effects may be just as problematic for the interpretation of meta-analyses, and the imperative is on meta-analytic authors to investigate potential sources of bias.[99]
The problem of publication bias is not trivial as it is suggested that 25% of meta-analyses in the psychological sciences may have suffered from publication bias.[100] However, low power of existing tests and problems with the visual appearance of the funnel plot remain an issue, and estimates of publication bias may remain lower than what truly exists.
Most discussions of publication bias focus on journal practices favoring publication of statistically significant findings. However, questionable research practices, such as reworking statistical models until significance is achieved, may also favor statistically significant findings in support of researchers' hypotheses.[101][102]
Studies often do not report the effects when they do not reach statistical significance.[103] For example, they may simply say that the groups did not show statistically significant differences, without reporting any other information (e.g. a statistic or p-value).[104] Exclusion of these studies would lead to a situation similar to publication bias, but their inclusion (assuming null effects) would also bias the meta-analysis.
Other weaknesses are that it has not been determined if the statistically most accurate method for combining results is the fixed, IVhet, random or quality effect models, though the criticism against the random effects model is mounting because of the perception that the new random effects (used in meta-analysis) are essentially formal devices to facilitate smoothing or shrinkage and prediction may be impossible or ill-advised.[105] The main problem with the random effects approach is that it uses the classic statistical thought of generating a "compromise estimator" that makes the weights close to the naturally weighted estimator if heterogeneity across studies is large but close to the inverse variance weighted estimator if the between study heterogeneity is small. However, what has been ignored is the distinction between the model we choose to analyze a given dataset, and the mechanism by which the data came into being.[106] A random effect can be present in either of these roles, but the two roles are quite distinct. There's no reason to think the analysis model and data-generation mechanism (model) are similar in form, but many sub-fields of statistics have developed the habit of assuming, for theory and simulations, that the data-generation mechanism (model) is identical to the analysis model we choose (or would like others to choose). As a hypothesized mechanisms for producing the data, the random effect model for meta-analysis is silly and it is more appropriate to think of this model as a superficial description and something we choose as an analytical tool – but this choice for meta-analysis may not work because the study effects are a fixed feature of the respective meta-analysis and the probability distribution is only a descriptive tool.[106]
Problems arising from agenda-driven bias
The most severe fault in meta-analysis often occurs when the person or persons doing the meta-analysis have an
A 2011 study done to disclose possible conflicts of interests in underlying research studies used for medical meta-analyses reviewed 29 meta-analyses and found that conflicts of interests in the studies underlying the meta-analyses were rarely disclosed. The 29 meta-analyses included 11 from general medicine journals, 15 from specialty medicine journals, and three from the
For example, in 1998, a US federal judge found that the United States
EPA's study selection is disturbing. First, there is evidence in the record supporting the accusation that EPA "cherry picked" its data. Without criteria for pooling studies into a meta-analysis, the court cannot determine whether the exclusion of studies likely to disprove EPA's a priori hypothesis was coincidence or intentional. Second, EPA's excluding nearly half of the available studies directly conflicts with EPA's purported purpose for analyzing the epidemiological studies and conflicts with EPA's Risk Assessment Guidelines. See ETS Risk Assessment at 4-29 ("These data should also be examined in the interest of weighing all the available evidence, as recommended by EPA's carcinogen risk assessment guidelines (U.S. EPA, 1986a) (emphasis added)). Third, EPA's selective use of data conflicts with the Radon Research Act. The Act states EPA's program shall "gather data and information on all aspects of indoor air quality" (Radon Research Act § 403(a)(1)) (emphasis added).[109]
As a result of the abuse, the court vacated Chapters 1–6 of and the Appendices to EPA's "Respiratory Health Effects of Passive Smoking: Lung Cancer and other Disorders".[109]
Comparability and validity of included studies
Meta-analysis may often not be a substitute for an adequately powered primary study.[110]
Heterogeneity of methods used may lead to faulty conclusions.[111] For instance, differences in the forms of an intervention or the cohorts that are thought to be minor or are unknown to the scientists could lead to substantially different results, including results that distort the meta-analysis' results or are not adequately considered in its data. Vice versa, results from meta-analyses may also make certain hypothesis or interventions seem nonviable and preempt further research or approvals, despite certain modifications – such as intermittent administration, personalized criteria and combination measures – leading to substantially different results, including in cases where such have been successfully identified and applied in small-scale studies that were considered in the meta-analysis.[citation needed] Standardization, reproduction of experiments, open data and open protocols may often not mitigate such problems, for instance as relevant factors and criteria could be unknown or not be recorded.[citation needed]
There is a debate about the appropriate balance between testing with as few animals or humans as possible and the need to obtain robust, reliable findings. It has been argued that unreliable research is inefficient and wasteful and that studies are not just wasteful when they stop too late but also when they stop too early. In large clinical trials, planned, sequential analyses are sometimes used if there is considerable expense or potential harm associated with testing participants.[112] In applied behavioural science, "megastudies" have been proposed to investigate the efficacy of many different interventions designed in an interdisciplinary manner by separate teams.[113] One such study used a fitness chain to recruit a large number participants. It has been suggested that behavioural interventions are often hard to compare [in meta-analyses and reviews], as "different scientists test different intervention ideas in different samples using different outcomes over different time intervals", causing a lack of comparability of such individual investigations which limits "their potential to inform policy".[113]
Weak inclusion standards lead to misleading conclusions
Meta-analyses in education are often not restrictive enough in regards to the methodological quality of the studies they include. For example, studies that include small samples or researcher-made measures lead to inflated effect size estimates.[114] However, this problem also troubles meta-analysis of clinical trials. The use of different quality assessment tools (QATs) lead to including different studies and obtaining conflicting estimates of average treatment effects.[115][116]
Applications in modern science
Modern statistical meta-analysis does more than just combine the effect sizes of a set of studies using a weighted average. It can test if the outcomes of studies show more variation than the variation that is expected because of the sampling of different numbers of research participants. Additionally, study characteristics such as measurement instrument used, population sampled, or aspects of the studies' design can be coded and used to reduce variance of the estimator (see statistical models above). Thus some methodological weaknesses in studies can be corrected statistically. Other uses of meta-analytic methods include the development and validation of clinical prediction models, where meta-analysis may be used to combine individual participant data from different research centers and to assess the model's generalisability,[117][118] or even to aggregate existing prediction models.[119]
Modern statistical meta-analysis does more than just combine the effect sizes of a set of studies using a weighted average. It can test if the outcomes of studies show more variation than the variation that is expected because of the sampling of different numbers of research participants. Additionally, study characteristics such as measurement instrument used, population sampled, or aspects of the studies' design can be coded and used to reduce variance of the estimator (see statistical models above). Thus some methodological weaknesses in studies can be corrected statistically. Other uses of meta-analytic methods include the development and validation of clinical prediction models, where meta-analysis may be used to combine individual participant data from different research centers and to assess the model's generalisability,[117][120] or even to aggregate existing prediction models.[121]
Meta-analysis can be done with single-subject design as well as group research designs.[122] This is important because much research has been done with single-subject research designs.[123] Considerable dispute exists for the most appropriate meta-analytic technique for single subject research.[124]
Meta-analysis leads to a shift of emphasis from single studies to multiple studies. It emphasizes the practical importance of the effect size instead of the statistical significance of individual studies. This shift in thinking has been termed "meta-analytic thinking". The results of a meta-analysis are often shown in a forest plot.
Results from studies are combined using different approaches. One approach frequently used in meta-analysis in health care research is termed 'inverse variance method'. The average effect size across all studies is computed as a weighted mean, whereby the weights are equal to the inverse variance of each study's effect estimator. Larger studies and studies with less random variation are given greater weight than smaller studies. Other common approaches include the Mantel–Haenszel method[125] and the Peto method.[126]
Seed-based d mapping (formerly signed differential mapping, SDM) is a statistical technique for meta-analyzing studies on differences in brain activity or structure which used neuroimaging techniques such as fMRI, VBM or PET.
Different high throughput techniques such as microarrays have been used to understand Gene expression. MicroRNA expression profiles have been used to identify differentially expressed microRNAs in particular cell or tissue type or disease conditions or to check the effect of a treatment. A meta-analysis of such expression profiles was performed to derive novel conclusions and to validate the known findings.[127]
Meta-analysis of whole genome sequencing studies provides an attractive solution to the problem of collecting large sample sizes for discovering rare variants associated with complex phenotypes. Some methods have been developed to enable functionally informed rare variant association meta-analysis in biobank-scale cohorts using efficient approaches for summary statistic storage.[128]
Sweeping meta-analyses can also be used to estimate a network of effects. This allows researchers to examine patterns in the fuller panorama of more accurately estimated results and draw conclusions that consider the broader context (e.g., how personality-intelligence relations vary by trait family).[129]
See also
Sources
This article incorporates text by Daniel S. Quintana available under the CC BY 4.0 license.
References
- ^ S2CID 5416879.
- ^ S2CID 30083129.
- S2CID 3185455.
- ^ Hunt, Morton (1997). How science takes stock : the story of meta-analysis (1st ed.). New York, New York, United States of America: Russell Sage Foundation.
- PMID 20761760.
- PMID 22407741.
- PMID 18065712.
- ^ Ghiselli, E. E. (1955). The measurement of occupational aptitude. University of California Publications in Psychology, 8, 101–216.
- ISSN 0031-5826.
- S2CID 86619593.
- S2CID 43326263.
- ISSN 1935-990X.
- ^ S2CID 225384392.
- PMID 9238555.
- PMID 27620683.
- S2CID 212629387.
- S2CID 73415319.
- PMID 35659294.
- ISSN 2378-1890.
- ^ "The PRISMA statement". Prisma-statement.org. 2 February 2012. Archived from the original on 27 July 2011. Retrieved 2 February 2012.
- ISSN 2041-210X.
- PMID 15473412.
- PMID 16549808.
- ^ PMID 26500598.
- PMID 17388659.
- ISSN 2832-9023.
- S2CID 37557674.
- PMID 24965054.
- S2CID 9543956, retrieved 26 December 2023
- S2CID 221619510.
- ^ PMID 32336025.
- PMID 17470488.
- ISSN 2047-2382.
- S2CID 27109643.
- PMID 31699124.
- PMID 28420349.
- S2CID 20624428.
- PMID 12583822.
- S2CID 204603849, retrieved 26 December 2023
- S2CID 33777183.
- S2CID 3601317.
- S2CID 8807106.
- PMID 26287812.
- PMID 23585842.
- PMID 27747915.
- PMID 11884693.
- S2CID 119814256.
- S2CID 17764847.
- ^ PMID 20409685.
- ^ S2CID 16932514.
- ^ S2CID 6556986.
- S2CID 887098.
- S2CID 21384942.
- PMID 19016302.
- PMID 10472946.
- S2CID 32994689.
- PMID 23494781.
- S2CID 51890354.
- ^ a b c d "MetaXL User Guide" (PDF). Retrieved 18 September 2018.
- – via ResearchGate.
- S2CID 22688261.
- S2CID 15798713.
- PMID 23922860.
- – via ResearchGate.
- ^ a b c d "MetaXL software page". Epigear.com. 3 June 2017. Retrieved 18 September 2018.
- S2CID 10792959.
- ^ S2CID 29723291.
- PMID 21147265.
- PMID 25872162.
- PMID 26003432.
- PMID 9250266.
- ^ S2CID 33613631.
- S2CID 7300890.
- ^ Heck DW, Gronau QF, Wagenmakers EJ, Patil I (17 March 2021). "metaBMA: Bayesian model averaging for random and fixed effects meta-analysis". CRAN. Retrieved 9 May 2022.
- ^ Bartoš F, Maier M, Wagenmakers EJ, Goosen J, Denwood M, Plummer M (20 April 2022). "RoBMA: An R Package for Robust Bayesian Meta-Analyses". Retrieved 9 May 2022.
- S2CID 237699937.
- S2CID 236826939.
- ^ S2CID 10860031.
- .
- PMID 24447592.
- S2CID 205844216.
- .
- PMID 22008217.
- PMID 22007046.
- S2CID 23397142.
- PMID 28620945.
- PMID 25800943.
- PMID 9262498.
- ^ S2CID 146457142.
- ISBN 978-0-8039-1864-1.
- ISBN 978-0-8039-1633-3.
- S2CID 145513046.
- ^ S2CID 36070395.
- SAGE Publications.
- S2CID 241159497.
- ISBN 978-0-674-85431-4.
- PMID 16392998.
- PMID 17420491.
- S2CID 123680599.
- PMID 21787082.
- PMID 22006061.
- S2CID 51686730. Archived from the original(PDF) on 24 November 2012.
- PMID 29337724.
- PMID 31501094.
- )
- ^ ISBN 978-1-4398-6683-2.
- PMID 22035723.
- S2CID 11270323
- ^ a b "The Osteen Decision". The United States District Court for the Middle District of North Carolina. 17 July 1998. Retrieved 18 March 2017.
- PMID 15313553.
- ISSN 1053-4822.
- S2CID 455476.
- ^ S2CID 245047340.
- S2CID 148531062.
- PMID 10493204.
- PMID 24044807.
- ^ PMID 26461078.
- S2CID 25308961.
- S2CID 39439611.
- S2CID 25308961.
- S2CID 39439611.
- PMID 24606971.
- S2CID 20442353.
- doi:10.1037/h0100613.
- S2CID 17698270.
- ^ Deeks JJ, Higgins JP, Altman DG, et al. (Cochrane Statistical Methods Group) (2021). "Chapter 10: Analysing data and undertaking meta-analyses: 10.4.2 Peto odds ratio method". In Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, Welch V (eds.). Cochrane Handbook for Systematic Reviews of Interventions (Version 6.2 ed.). The Cochrane Collaboration.
- PMID 19948767.
- S2CID 255084231.
- S2CID 265335858.
Further reading
- Cornell JE, Mulrow CD (1999). "Meta-analysis". In ISBN 978-0-7619-5883-3.
- Ellis PD (2010). The Essential Guide to Effect Sizes: An Introduction to Statistical Power, Meta-Analysis and the Interpretation of Research Results. Cambridge: Cambridge University Press. ISBN 978-0-521-14246-5.
- Sutton AJ, Jones DR, Abrams KR, Sheldon TA, Song F (2000). Methods for meta-analysis in medical research. London: John Wiley. ISBN 978-0-471-49066-1.
- Wilson DB, Lipsey MW (2001). Practical meta-analysis. Thousand Oaks: Sage publications. ISBN 978-0-7619-2168-4.
- Cooper H, Hedges LV, eds. (1994). The Handbook of Research Synthesis. New York: Russell Sage Foundation. ISBN 978-0-87154-226-7.
- Bonett DG (December 2010). "Varying coefficient meta-analytic methods for alpha reliability". Psychological Methods. 15 (4): 368–385. S2CID 207710319.
- Bonett DG, Price RM (November 2014). "Meta-analysis methods for risk differences". The British Journal of Mathematical and Statistical Psychology. 67 (3): 371–387. PMID 23962020.
- Bonett DG (September 2008). "Meta-analytic interval estimation for bivariate correlations". Psychological Methods. 13 (3): 173–181. S2CID 5690835.
- Bonett DG (September 2009). "Meta-analytic interval estimation for standardized and unstandardized mean differences". Psychological Methods. 14 (3): 225–238. PMID 19719359.
- Bonett DG, Price RM (September 2015). "Varying coefficient meta-analysis methods for odds ratios and risk ratios". Psychological Methods. 20 (3): 394–406. PMID 25751513.
- Bonett DG (November 2020). "Point-biserial correlation: Interval estimation, hypothesis testing, meta-analysis, and sample size determination". The British Journal of Mathematical and Statistical Psychology. 73 (Suppl 1): 113–144. S2CID 203607297.
- PMID 10070677.
- Owen AB (December 2009). "Karl Pearson's meta-analysis revisited" (PDF). The Annals of Statistics. 37 (6B): 3867–2892. S2CID 7632667. Archived from the original(PDF) on 26 July 2011.
- Slough, Tara; Tyson, Scott A. (2022). "External Validity and Meta‐Analysis". American Journal of Political Science. doi:10.1111/ajps.12742. ISSN 0092-5853.
- Thompson SG, Pocock SJ (November 1991). "Can meta-analyses be trusted?" (PDF). Lancet. 338 (8775): 1127–1130. S2CID 29743240. Archived from the original(PDF) on 22 November 2011. Retrieved 17 June 2011.. Explores two contrasting views: does meta-analysis provide "objective, quantitative methods for combining evidence from separate but similar studies" or merely "statistical tricks which make unjustified assumptions in producing oversimplified generalisations out of a complex of disparate studies"?
- O'Rourke K (2007). "Just the history from the combining of information: investigating and synthesizing what is possibly common in clinical observations or studies via likelihood" (PDF). Oxford: University of Oxford, Department of Statistics. Archived from the original (PDF) on 2 November 2011. Gives technical background material and details on the "An historical perspective on meta-analysis" paper cited in the references.