An effect size is a measure of how much an experimental group exceeds a control group, controlling for pretests. As every quantitative researcher knows, the formula is (X_{T} – X_{C})/SD, or adjusted treatment mean minus adjusted control mean divided by the unadjusted standard deviation. If this is all gobbledygook to you, I apologize, but sometimes us research types just have to let our inner nerd run free.
Effect sizes have come to be accepted as a standard indicator of the impact an experimental treatment had on a posttest. As research becomes more important in policy and practice, understanding them is becoming increasingly important.
One constant question is how important a given effect size is. How big is big? Many researchers still use a rule of thumb from Cohen to the effect that +0.20 is “small,” +0.50 is “moderate,” and +0.80 or more is “large.” Yet Cohen himself disavowed these standards long ago.
High-quality experimental-control comparison research in schools rarely gets effect sizes as large as +0.20, and only one-to-one tutoring studies routinely get to +0.50. So Cohen’s rule of thumb was demanding effect sizes for rigorous school research far larger than those typically reported in practice.
An article by Hill, Bloom, Black, and Lipsey (2008) considered several ways to determine the importance of effect sizes. They noted that students learn more each year (in effect sizes) in the early elementary grades than do high school students. They suggested that therefore a given effect size for an experimental treatment may be more important in secondary school than the same effect size would be in elementary school. However, in four additional tables in the same article, they show that actual effect sizes from randomized studies are relatively consistent across the grades. They also found that effect sizes vary greatly depending on methodology and the nature of measures. They end up concluding that it is most reasonable to determine the importance of an effect size by comparing it to effect sizes in other studies with similar measures and designs.
A study done by Alan Cheung and myself (2016) reinforces the importance of methodology in determining what is an important effect size. We analyzed all findings from 645 high-quality studies included in all reviews in our Best Evidence Encyclopedia (www.bestevidence.org). We found that the most important factors in effect sizes were sample size and design (randomized vs. matched). Here is the key table.
Effects of Sample Size and Designs on Effect Sizes
Sample Size | ||
Design | Small | Large |
Matched | +0.33 | +0.17 |
Randomized | +0.23 | +0.12 |
What this chart shows is that matched studies with small sample sizes (less than 250 students) have much higher effect sizes, on average, than, say, large randomized studies (+0.33 vs. +0.12). These differences say nothing about the impact on children, but are completely due to differences in study design.
If effect sizes are so different due to study design, then we cannot have a single standard to tell us when an effect size is large or small. All we can do is note when an effect size is large compared to similar studies. For example imagine that a study finds an effect size of +0.20. Is that big or small? If it was a matched study with a small sample size, +0.20 would be a rather small impact. If it were a randomized study with a large sample size, it might be considered quite a large impact.
Beyond study methods, a good general principle is to compare like with like. For example, some treatments may have very small effect sizes, but they may be so inexpensive or may affect so many students that a small effect may be important. For example, principal or superintendent training may affect very many students, or benchmark assessments may be so inexpensive that a small effect size may be worthwhile, and may compare favorably with equally inexpensive means of solving the same problem.
My colleagues and I will be developing a formula to enable researchers and readers to easily put in features of a study to produce an “expected effect size” to determine more accurately whether an effect size should be considered large or small.
Not long ago, it would not have mattered much how large effect sizes were considered, but now it does. That’s an indication of the progress we have made in recent years. Big indeed!
This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.