In researching John Hattie’s meta-meta analyses, and digging into the original studies, I discovered one underlying factor that more than anything explains why he consistently comes up with greatly inflated effect sizes: Most studies in the meta-analyses that he synthesizes are brief, small, artificial lab studies. And lab studies produce very large effect sizes that have little if any relevance to classroom practice.
This discovery reminds me of one of the oldest science jokes in existence: (One scientist to another): “Your treatment worked very well in practice, but how will it work in the lab?” (Or “…in theory?”)
The point of the joke, of course, is to poke fun at scientists more interested in theory than in practical impacts on real problems. Personally, I have great respect for theory and lab studies. My very first publication as a psychology undergraduate involved an experiment on rats.
Now, however, I work in a rapidly growing field that applies scientific methods to the study and improvement of classroom practice. In our field, theory also has an important role. But lab studies? Not so much.
A lab study in education is, in my view, any experiment that tests a treatment so brief, so small, or so artificial that it could never be used all year. Also, an evaluation of any treatment that could never be replicated, such as a technology program in which a graduate student is standing by every four students every day of the experiment, or a tutoring program in which the study author or his or her students provide the tutoring, might be considered a lab study, even if it went on for several months.
Our field exists to try to find practical solutions to practical problems in an applied discipline. Lab studies have little importance in this process, because they are designed to eliminate all factors other than the variables of interest. A one-hour study in which children are asked to do some task under very constrained circumstances may produce very interesting findings, but cannot recommend practices for real teachers in real classrooms. Findings of lab studies may suggest practical treatments, but by themselves they never, ever validate practices for classroom use.
Lab studies are almost invariably doomed to success. Their conditions are carefully set up to support a given theory. Because they are small, brief, and highly controlled, they produce huge effect sizes. (Because they are relatively easy and inexpensive to do, it is also very easy to discard them if they do not work out, contributing to the universally reported tendency of studies appearing in published sources to report much higher effects than reports in unpublished sources). Lab studies are so common not only because researchers believe in them, but also because they are easy and inexpensive to do, while meaningful field experiments are difficult and expensive. Need a publication? Randomly assign your college sophomores to two artificial treatments and set up an experiment that cannot fail to show significant differences. Need a dissertation topic? Do the same in your third-grade class, or in your friend’s tenth grade English class. Working with some undergraduates, we once did three lab studies in a single day. All were published. As with my own sophomore rat study, lab experiments are a good opportunity to learn to do research. But that does not make them relevant to practice, even if they happen to take place in a school building.
By doing meta-analyses, or meta-meta-analyses, Hattie and others who do similar reviews obscure the fact that many and usually most of the studies they include are very brief, very small, and very artificial, and therefore produce very inflated effect sizes. They do this by covering over the relevant information with numbers and statistics rather than information on individual studies, and by including such large numbers of studies that no one wants to dig deeper into them. In Hattie’s case, he claims that Visible Learning meta-meta-analyses contain 52,637 individual studies. Who wants to read 52,637 individual studies, only to find out that most are lab studies and have no direct bearing on classroom practice? It is difficult for readers to do anything but assume that the 52,637 studies must have taken place in real classrooms, and achieved real outcomes over meaningful periods of time. But in fact, the few that did this are overwhelmed by the thousands of lab studies that did not.
Educators have a right to data that are meaningful for the practice of education. Anyone who recommends practices or programs for educators to use needs to be open about where that evidence comes from, so educators can judge for themselves whether or not one-hour or one-week studies under artificial conditions tell them anything about how they should teach. I think the question answers itself.
This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.