On Meta-Analysis: Eight Great Tomatoes

I remember a long-ago advertisement for Contadina tomato paste. It went something like this:

Eight great tomatoes in an itsy bitsy can!

This ad creates an appealing image, or at least a provocative one, that I suppose sold a lot of tomato paste.

In educational research, we do something a lot like “eight great tomatoes.” It’s called meta-analysis, or systematic review.  I am particularly interested in meta-analyses of experimental studies of educational programs.  For example, there are meta-analyses of reading and math and science programs.  I’ve written them myself, as have many others.  In each, some number of relevant studies are identified.  From each study, one or more “effect sizes” are computed to represent the impact of the program on important outcomes, such as scores on achievement tests. These are then averaged to get an overall impact for each program or type of program.  Think of the effect size as boiling down tomatoes to make concentrated paste, to fit into an itsy bitsy can.

But here is the problem.  The Contadina ad specifies eight great tomatoes. If even one tomato is instead a really lousy one, the contents of the itsy bitsy can will be lousy.  Ultimately, lousy tomato pastes would bankrupt the company.

The same is true of meta-analyses.  Some meta-analyses include a broad range of studies – good, mediocre, and bad.  They may try to statistically control for various factors, but this does not do the job.  Bad studies lead to bad outcomes.  Years ago, I critiqued a study of “class size.”  The studies of class size in ordinary classrooms found small effects.  But there was one study that involved teaching tennis.  In small classes, the kids got a lot more court time than did kids in large classes.  This study, and only this study, found substantial effects of class size, significantly affecting the average.  There were not eight great tomatoes, there was at least one lousy tomato, which made the itsy bitsy can worthless.

The point I am making here is that when doing meta-analysis, the studies must be pre-screened for quality, and then carefully scrubbed.  Specifically, there are many factors that greatly (and falsely) inflate effect size.  Examples include use of assessments made by the researchers and ones that assess what was taught in the experimental group but not the control group, use of small samples, and provision of excessive assistance to the teachers.

Some meta-analyses just shovel all the studies onto a computer and report an average effect size.  More responsible ones shovel the studies into a computer and then test for and control for various factors that might affect outcomes. This is better, but you just can’t control for lousy studies, because they are often lousy in many ways.

Instead, high-quality meta-analyses set specific criteria for inclusion intended to minimize bias.  Studies often use both valid measures and crummy measures (such as those biased toward the experimental group).  Good meta-analyses use the good measures but not the (defined in advance) crummy ones.  Studies that only used crummy measures are excluded.  And so on.

With systematic standards, systematically applied, meta-analyses can be of great value.  Call it the Contadina method.  In order to get great tomato paste, start with great tomatoes. The rest takes care of itself.