Are Proven Educational Innovations Ready for Prime Time?

These are good times for evidence-based reform in education. Due in particular to Investing in Innovation (i3) and the Institute of Education Sciences (IES), the number of proven programs in all subjects and grade levels in increasing, and as i3 programs come to the ends of their evaluations, the number of proven programs should accelerate (even though there are also sure to be many disappointments).

The increasing numbers of programs proven effective in rigorous research creates new opportunities for policy and practice in education. Already, School Improvement Grants (SIG) are newly introducing an option for schools to choose proven, comprehensive reform models. Other areas of policy may also soon begin to encourage or incentivize use of programs with strong evidence.

If these changes in policy begin to happen, it will begin to matter whether educational programs proven to be effective are in fact ready for prime time, meaning that they are ready to be disseminated and supported in the form they existed when they were successfully evaluated. It would be catastrophic if educators and policy makers began looking on the What Works Clearinghouse, for example, or looking for programs that meet the EDGAR standards for strong or moderate evidence of effectiveness, and found many program that were unavailable, or unrealistic, or impractical.

Much as providing evidence of effectiveness is an advance in education, there is a real need for a determination of the degree to which programs are also ready for widespread implementation.

Some indicators of readiness for prime time would be easy to assess. For example, programs that lack a web site, do not offer materials or training, or otherwise do not exist in anything like the form in which they were evaluated cannot be considered ready for implementation. Some programs used procedures in their evaluation that could never be replicated, such as science programs that provide each experimental class with enough graduate students to monitor and assist every lab group. Some proven technology products run on hardware that no longer exists.

Many studies use measures of learning outcomes that are closely aligned with what was taught in the experimental but not the control group. Such studies might be excluded on the basis that the overaligned measure does not have meaning beyond the experiment itself.

Educators who choose to use proven programs have a right to be confident that the programs they have selected are, if implemented well, likely to result in enhanced performance on measures they care about. Finding a lot of programs that cannot be implemented under ordinary circumstances and with meaningful measures will diminish interest in evidence-based reform.

Evidence-based reform itself is ready for prime time in education, but its future depends on whether it is perceived to produce genuine benefits for children. We need to make sure that the proven programs we offer to educators meet their needs and those of the students, not just scientific standards.

Promoting Proven Programs in Title I: The Bully Pulpit


Title I, the 800-pound gorilla of federal education policy, spends $15 billion a year to help high-poverty schools enhance outcomes for their students. The greatest victory for evidence-based reform would be for the roughly 51,500 Title I schools to make far greater use of programs known to enhance student learning, benefitting millions of at-risk children throughout the U.S. Yet because Title I is a formula grant, it is difficult for federal policy to increase use of proven approaches. Competitive grants can provide preference points for using proven models, as I’ve argued before, but in formula grants, it is more difficult to nudge educational leaders toward use of proven programs, since they will receive their money as long as they follow the rules.

One solution to this problem might be to borrow President Theodore Roosevelt’s conception of the presidency as a “bully pulpit.” In other words, even during a time of congressional gridlock, it is possible for the administration to promote the use of proven approaches, even in formula grants, at little or no cost.

The first thing the U.S. Department of Education would have to do is to review all the programs in the What Works Clearinghouse according to the simpler, clearer standards in the EDGAR regulations. Someone would then have to prune the resulting lists of programs, identifying programs that meet the EDGAR standards for “strong” and “moderate” levels of evidence to remove programs that no longer exist or that do not have anyone providing training and materials similar to those provided in the successful studies. The remaining programs would represent a good starting list of programs that, if implemented well, would be likely to have positive impacts on student achievement.

Department officials could then publicize this list in many ways. Certainly, they could create a web site showing the programs and the evidence behind them and linking to the programs’ web sites. They might sponsor “effective methods fairs” around the U.S. to demonstrate programs available for schools and districts to choose. They might distribute certificates to schools that adopt proven programs and then implement them with fidelity, as certified by the developers.

These strategies and others could arouse widespread interest in proven programs, and help school leaders make a wide array of choices of programs appropriate to their needs.

If funds became available, the Department might provide modest incentive grants to help schools supplement the start-up costs of proven programs. But even without special incentive funding, schools should be able to make choices from among programs known to be likely to help them succeed with their children, using their existing Title I funds.

A creative “bully pulpit” policy might begin a process of expanding use of existing proven programs, encouraging creation and evaluation of new ones, and increasing sophistication in choosing how to spend federal resources. All of this could be accomplished for nearly nothing, while gradually moving the $15 billion in Title I toward more effective uses. Over time, such a policy would also encourage developers and researchers to create and evaluate programs likely to meet EDGAR standards, and it could help build political support for investments in R&D that ultimately result in better outcomes for children on a broad scale.

A “bully pulpit” strategy would still need to be accompanied by policies of providing incentives to adopt proven programs in competitive grants, and with continued support for the R&D pipeline, such as that provided by Investing in Innovation (i3), the Institute of Education Sciences (IES), and the National Science Foundation (NSF). However, development and research in education have to go beyond R&D; they need to be seen as a routine, growing part of the world of educational practice and innovation.

*Photo courtesy of the Library of Congress

Thank You, Jim Shelton


Everyone who works to advance evidence-based reform in education was saddened to learn that Jim Shelton will be leaving the U.S. Department of Education. Jim is currently the Department’s Deputy Secretary, and before that he was the Assistant Deputy Secretary for the Office of Innovation and Improvement (OII), the home of the Investing in Innovation (i3) program. Under his leadership, i3 created a unique approach to speeding up innovation in education, supporting development of new programs, evaluation of promising programs, and scale-up of already proven programs. Jim has provided the intellectual leadership for i3, understanding better than anyone the process by which proven and replicable programs could gradually infuse government-funded programs in education, starting with competitive programs (such as School Improvement Grants) and then moving on to formula programs (such as Title I).

Jim has been a tireless advocate for evidence, but even more so for kids. He understands that the children who need the best schools need the best programs. Better teachers? Smaller classes? Better parent support? Sure, they need these too, but until core classroom practices are better in every class, millions of children will continue to fail each year.

Jim is a powerful speaker and a powerful advocate for his ideas. I don’t know where he’s going next, but I’m certain that wherever it is, he will carry on the fight for children and for evidence so that the programs we provide children actually work.

Where’s the Science in Science Education?


A constant refrain in articles about education and the economy highlights the need for more of a focus on STEM: Science, Technology, Engineering, and Mathematics. In fact, the National Science Foundation and many other public and private entities spend billions each year to advance STEM education. STEM is indeed critical for American economic competitiveness and progress. So naturally you’d expect that STEM subjects would be among the best researched of all, right?

Wrong. My colleagues and I just published a review of research on elementary science programs in the most prestigious science education journal, the Journal of Research on Science Teaching (JRST). I’ll get to the substantive conclusions in a moment. What I want to focus on first is the most important finding of the review: that we found only 23 studies that met our inclusion standards. Our standards were not that tough. We required that studies compare experimental to well-matched or randomly assigned control groups on measures that fairly assessed what was taught in both groups. Studies had to last at least 4 weeks (less than the 12 weeks we’ve required in every other subject). Our 23 studies were the product of all qualifying research published in English throughout the world over a period of more than 30 years. That’s less than one study per year. Had we required random assignment and analysis at the level of random assignment, only seven studies would have qualified.

Of course, there are thousands of studies of elementary science teaching. Why did so few meet our standards? A lot of them had no control group, or no measure of science learning. Many were very brief lab studies lasting from an hour to a few days.

Among the few studies that did compare experimental and control classes over at least four weeks, most had obvious problems that made it impossible to include them. Many used measures made to register the gains in the experimental group but unrelated to what was taught in the control group. For example, many studies taught a unit on, say, electricity, to the experimental group and compared their gains on an electricity test to those of a group that was not taught electricity at all during the experiment.

Among the studies we could include, the outcomes favored inquiry approaches that emphasized professional development for teachers, using methods such as cooperative learning and reading-science integration. Inquiry methods using science kits did no better than control groups, and disturbingly, these were the highest-quality studies. There were positive effects for approaches emphasizing technology, but there were very few studies in this category.

The larger question posed by our review, however, is why there were so few qualifying studies. How could the entire field of science education produce less than one methodologically adequate experimental study of practical elementary science approaches per year?

At first, my colleagues and I thought that this problem must surely just be due to the fact that science educators focus more on secondary schools than elementary schools. However, we are now working on a review of secondary science programs, under a grant from the Spencer Foundation. We are not finding markedly more qualifying studies at that level, either.

The number of studies that meet similar inclusion standards in elementary and secondary reading and math is much higher than in science. What is it about science education that makes such research rare? Of course, there is a poignant irony in the observation that among all major branches of educational research, science education is least likely to use rigorous scientific evidence to evaluate its own programs. Science educators should be, and could still become, leaders in evidence-based reform, but this will require a serious change in direction in the field.

Test-Based Accountability, Inspectors, and Evidence


Recently, Marc Tucker of the National Center for Education and the Economy wrote a thoughtful critique of education policy in the U.S., questioning the heavy reliance on test-based accountability and suggesting that the U.S. adopt a system like those used in most of our peer countries, with less frequent and less consequential testing and a reliance on inspectors to visit all schools, especially those lagging on national measures. In the New York Times, Joe Nocera heaped praise on Tucker’s analysis.

Personally, I agree with Tucker’s (and Nocera’s) enthusiasm for an assessment and accountability system that uses testing in a more thoughtful, less draconian way. There is certainly little evidence to support test-based accountability with substantial consequences for schools and teachers as it is being used today. I’d also be glad to see U.S. schools try the kinds of independent school inspectors used in most of our peer countries.

However, as I’ve noted in earlier blogs, it’s fun to consider what other countries do, but there are too many factors involved to infer that adopting the policies of other countries will work here. I work part-time in England, which uses exactly the policies Tucker espouses. Its accountability measures are used only at the end of primary school (6th grade) and secondary school (11th). An independent and respected corps of inspectors visits schools, making more frequent visits when schools’ scores are low or declining. All well and good, but England’s PISA scores are nearly identical to ours. England’s gentler accountability policies make teaching and school leadership less unpleasant than it is here, and the hysteria and pressure-induced cheating often seen here are unknown in England, so their policies may be better in many ways. But the different U.S. policies are not the main cause of the modest rankings of U.S. students.

As another example, consider Canada. Provinces vary, but no Canadian school system tests as often as we do. However, they do not have inspectors, and Canada always scores well above both the U.S. and England. So, should we emulate nearby Canada rather than faraway Finland or Shanghai? Perhaps, but before getting too excited about our neighbor to the north, it’s important to note that the U.S. states nearest to Canada, such as Massachusetts, Minnesota, and Washington, are among the highest U.S. achievers. Is Canada’s success due to policies or demography?

My point is just that while international comparisons might suggest policies or practices worth piloting and evaluating in the U.S., the main focus should be on evaluations of policies, practices, and programs within the U.S. If inspectors, for example, seem like a good idea, let’s try them in U.S. schools, randomly assigning some schools to receive inspectors and some not.

We can all make our best guesses about what might work to improve U.S. schools, but let’s put our guesses to the test.