How to Interpret Effect Sizes

How to Interpret Effect Sizes



An effect size is a measure that describes the magnitude of the difference between two groups. Effect sizes are particularly valuable in best practices research because they represent a standard measure by which all outcomes can be assessed. For example, we can compare effect sizes of dropout, graduation, and academic outcomes on the same scale.

An effect size is typically calculated by taking the difference in means between two groups and dividing that number by their combined (pooled) standard deviation. Intuitively, this tells us how many standard deviations’ difference there is between the means of the intervention (treatment) and comparison conditions; for example, an effect size of .25 indicates that the treatment group outperformed the comparison group by a quarter of a standard deviation. 


Many researchers use the concept of statistical significance to determine whether a particular study had an effect; however, this is not necessarily a good idea since statistical significance is heavily dependent upon sample size. For example, very small effects can be statistically significant with a study sample of 10,000 students, while relatively large effects are often not statistically significant with a study sample of 30 students. Ultimately, what matters most is not statistical significance, but rather, whether the size of an effect is meaningful in a practical sense. Researchers use the concept of effect size to determine this.

Cohen (1988) proposed rules of thumb for interpreting effect sizes: a “small” effect size is .20, a “medium” effect size is .50, and a “large” effect size is .80. As Cohen warned, however, these rules of thumb may be different for each field of study. For example, effect sizes of dropout prevention programs are oftentimes much smaller than effect sizes for reading programs. For the TEA BPC, we use an effect size of .25 as the threshold to meet the evidence type, “Practice with Rigorous Scientific Evidence.” We chose this effect size threshold because it represents a conservative estimate of effects and because it was defined by the U.S. Department of Education’s What Works Clearinghouse as a “substantively important” effect.


Information on effect sizes can be found in the Supporting Evidence section of those Best Practice Summaries for which these calculations were appropriate. The BPC calculates effect sizes on the main outcome of interest for interventions meeting the “Practice with Rigorous Scientific Evidence” and “Practice with Quantitative Evidence” designations. Through the presentation of effect sizes, the BPC provides practitioners, policymakers, and researchers with a standard measure of comparison across Best Practice Summaries.


Ultimately, effect sizes provide one important piece of information that should be considered in the decision to adopt or enhance programs. Other information to consider includes the rigor of the research design (how believable are these effects?), the cost of the intervention to adopt, the generalizability of findings (e.g., could these findings be replicated in your setting?), as well as political considerations.


TEA BPC Methods for Calculating Effect Sizes


The TEA BPC selected two standard approaches for calculating effect sizes. The Hedges’ G formula is used to calculate effect sizes of continuous outcomes and the Cox Index formula is used to calculate effect sizes of binary outcomes. The formulas and key points for consideration are described below. 


Continuous Outcomes


Hedges’ G Formula




   X1 = (Adjusted) Mean of Treatment Group  


   X2 = (Adjusted Mean of Comparison Group  


   Spooled = Pooled (Combined) Standard Deviation


   N1 = Sample size of Treatment Group


   N2 = Sample Size of Comparison Group  


   S1 = Standard Deviation of Treatment Group


   S2 = Standard Deviation of Comparison Group


  Key points for calculating the effect sizes of continuous variables


  • Effect sizes are calculated by expressing how many standard deviations separate two groups.
  • By focusing on standard deviations, we can provide a standardized measure of effect.
  • Sample sizes are only taken into account in the standard deviation formula; unlike statistical significance calculations, effect size calculations are not particularly sensitive to changes in sample size.
  • Hedges’ G has been shown to upwardly bias effect sizes, so a small sample size correction is applied as outlined by Hedges (1981). This correction is also used in the U.S. Department of Education’s What Works Clearinghouse (WWC) effect size calculations.
  • Other effect size calculations for continuous variables include Cohen’s D (which uses a slightly different formula for pooled SD) and Glass’s Delta (which uses the comparison group’s SD instead of the pooled SD). 




Binary Outcomes  


Cox Index Formula




LORcox = Logged Odds Ratio (Cox Index)  


Logged Odds Ratio = Difference between natural log of Odds Ratios


Key points for calculating the effect sizes of binary variables


  • Effect sizes for binary variables take the shape of odds ratios (i.e., the probability of an event occurring in the treatment group [e.g., completing school] divided by probability of an event occurring in the comparison group).
  • Researchers have found that the Cox Index:
    • Provides the most unbiased effect sizes and, more importantly,
    • Provides estimates closest in line with effect size calculations of continuous variables.
  • The Cox Index does not require a standard deviation (it can be derived).
  • The Cox Index approaches infinity as means approach 0% and 100%; therefore, effect sizes may seem implausibly large in some cases. 


Literature References


  • Hedges’ G 


Hedges, L.V. (1981). Distribution theory for Glass's estimator of effect size and related estimators. Journal of Educational Statistics, 6(2), 107–128. 


  • Cox Index


Sánchez-Meca, J., Marín-Martínez, F., & Chacón-Moscoso, S. (2003). Effect-size indices for dichotomized outcomes in meta-analysis. Psychological Methods, 8, 448-467.


  • Benchmarks: 


Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum. 


  • What Works Clearinghouse Guide for Calculating Effect Sizes: