Is Now the Time to Reauthorize ESEA?

2015-04-09-1428593975-5920827-CurvesAhead_500x3902.jpg

The Elementary and Secondary Education Act (ESEA), currently also known as No Child Left Behind (NCLB), the giant centerpiece of educational policy, is up for reauthorization. Again. What that means is that it’s time to revisit the act in order to make changes and improvements to the law. Of course, it was supposed to be reauthorized in 2007, but what with partisan politics, outside influences and the lack of any general consensus around the various efforts, Congress has yet to successfully reauthorize the legislation. As a result, national educational policy has been a patchwork of waivers, dodges, and weaves unworthy of a great nation. ESEA is the Eeyore of legislation: “I’ll probably never be reauthorized.” Or the Rodney Dangerfield: “I get no respect.” Or the Godot, for which we’ll be forever waiting.

This year, Congress is taking up ESEA reauthorization again, but the road ahead remains long and fraught with obstacles. The House version, introduced by Reps. John Kline (R-Minnesota) and Todd Rokita (R-Indiana), made it through the Education and Workforce Committee along strict party lines, yet in February it was pulled right before a vote by the full House, with many surmising that it just wasn’t conservative enough to garner the votes it would need to pass. This week, Sens. Lamar Alexander (R-Tennessee) and Patty Murray (D-Washington) released a bipartisan compromise bill that they hope will make it through the Senate. But the draft is still open to amendments by the members of the HELP Committee and then the full Senate, and whether a single bill can satisfy the demands and desires of the broad political spectrum entrenched in Washington right now is unclear. Even if ESEA does not get reauthorized this Congress, the process is a necessary step toward eventually creating a better bill. Each Congress, when ESEA is debated, progress is made, and sometimes that progress leads to positive changes even without a comprehensive agreement. But it would be nice to have a well-considered, widely supported law at the center of education policy.

On the other hand, there are several reasons that it may not be so awful to delay reauthorization until after the next presidential election. Beyond the hope that things might be less partisan by then, there are several positive developments underway that are not yet far enough along to be central to ESEA but could be given two more years.

The first, of course, is the evidence movement. Recent investments, such as Investing in Innovation and IES, have produced a broad set of proven and promising programs for schools. Schools are just starting to be encouraged to use proven programs with their federal funds, as in the evidence-proven, whole-school approach to school-improvement grants. Title II (professional development) has begun requiring grantees to have at least moderate evidence of effectiveness and gives a lot of competitive preference points for programs with strong evidence. President Obama’s budget proposal contained a provision called “Leveraging What Works,” providing schools with incentive funds if they use their formula funding to adopt proven programs. These changes are just happening now, too recently to affect ESEA. If they continue for two more years, they may have profound impacts on ESEA.

Another development is Common Core. This set of standards, and the computerized testing sometimes associated with them, are too new to be fully understood. In two years their potential role in ESEA will be better known.

Finally, technology is headed into our schools at an astonishing pace, yet we still are not clear about how to use it or what it will do. I’d be reluctant to build technology policies into ESEA before we really know more about what universal access to digital devices could accomplish.

Given how long No Child Left Behind has overstayed its welcome, it may be especially important to get the next reauthorization right. It could be with us for a very long time!

Leveraging What Works

2015-02-19-HP6319_2015.jpg

In my blog from two weeks ago, I discussed several exciting proposals in President Obama’s recent budget relating to increasing the role of evidence in education policy and practice. Today, I want to say more about one of these proposals, Leveraging What Works (LWW).

Leveraging What Works is deceptively simple. It offers grants totaling $100 million nationwide to school districts willing to use the grant, along with a portion of its formula funds — such as Title I and IDEA — to adopt proven programs that meet the “strong” or “moderate” level of evidence of effectiveness as defined in EDGAR.

Simple though it appears, Leveraging What Works would be revolutionary. Here’s why.

First, the program would generate a huge amount of interest. Winning LWW funding would be sought after avidly not only for the money itself but as a feather in the cap of innovative thought-leader districts. These districts will be eager to win the money and tell their stories. The whole process will create a positive “buzz” around the use of proven programs.

Because of the money and the positive buzz, many more districts will apply for LWW funding than can be funded. Yet having looked at the range of proven programs available to them, many of these districts will choose to adopt proven programs using their formula funding even without the LWW grant. This is exactly what happened with the Obey-Porter Comprehensive School Reform Demonstration Act (CSR) of the late 1990’s. Thousands of schools applied for modest grants to help them adopt whole-school models, and each year, hundreds of schools that were turned down for grant funding adopted CSR models anyway, using other funding.

Leveraging What Works could revive the idea that formula funding can be the fuel for innovation rather than just a mainstay of the status quo. Let’s be honest: It’s been a long time since Title I has been considered sexy. LWW could energize Title I advocates and those who want schools to have the freedom to choose what works to improve outcomes for children. Title I needs to move from a compliance mindset to an innovation mindset, and LWW could help make this happen. It could help establish Title I schools as the places where up-and-coming teachers and administrators want to be, because those are the schools that get the first crack at the latest proven innovations.

Leveraging What Works would also energize the world of research and development, and the funders of R&D within and outside government. They would see programs proven in rigorous research being eagerly adopted by schools nationwide, and seeing the clear connection between research, development, and practice, they would redouble their efforts to create and evaluate promising, replicable programs of all kinds.

Until recently, it would have been difficult to justify an initiative like Leveraging What Works, but thanks to Investing in Innovation (i3), IES, NSF, and other funders, the number of proven programs is growing. For example, I recently counted 28 elementary reading approaches, from tutoring to whole-school reform, that should meet the EDGAR standards, and more are qualifying every year. Every one of these is actively disseminating its methods and is ready to grow.

One curious aspect of the Leveraging What Works proposal is that it provides incentives for the use of formula funding to adopt proven programs but does not provide similar incentives for adopting proven programs using competitive grants. When competitive grants are offered to schools, districts, or states, it would be easy to incentivize the use of proven programs by giving preference points to proposals that commit to using them. For example, proposals might get four extra points for choosing a program that meets the EDGAR “strong” definition, and two points for choosing a program meeting the EDGAR “moderate” definition, as I’ve argued before. It may be that this strategy was left out of the budget proposal because it does not really cost anything, so I hope it will be part of the administration’s plans whatever happens with LWW.

The Greek mathematician Archimedes said, “Give me a lever long enough and a fulcrum on which to place it and I’ll move the Earth.” Leveraging What Works could be such a lever, a modest investment with potential to make a meaningful difference in the lives of millions of children.

America’s Strength: An Innovation Economy

2015-01-08-HP56_AmerInnov_01_08_2015.jpg

In a 2012 article in The New York Times called “China’s Rise Isn’t Our Demise,” Vice President Joe Biden wrote a cogent summary of America’s advantage in the world economy that has enormous implications for innovation in education.

“The United States is hard-wired for innovation. Competition is in the very fabric of our society. It has enabled each generation of Americans to give life to world-changing ideas – from the cotton gin to the airplane, the microchip, the Internet.

We owe our strength to our political and economic system and to the way we educate our children – not merely to accept established orthodoxy but to challenge and improve it… Our universities remain the ultimate destination for the world’s students and scholars.”

Nothing in Biden’s article was new or surprising. Every American understands that our success in the world economy depends on education and innovation.

So why do we devote so little attention to innovation in education? The very orientations and investments Vice President Biden cited as the basis of our success in other fields are rarely applied to improving education itself. Instead of inventing our way to success, as we do in so many other fields, we keep trying to improve education through changes in governance, regulations, and rules, which never produce change in core classroom practices and outcomes. Every state’s textbook adoption requirements specify paperweight, but never mention the weight of evidence behind the use of the book. Special education regulations specify that children be placed in the “least restrictive environment” but never the “most effective environment.” Title I has reams of regulations about how funds can or can’t be spent, but hardly a word suggesting that they be spent on programs proven to work.

The shelf of proven programs is steadily growing, due to investments at the Institute for Education Sciences, Investing in Innovation (i3), the National Science Foundation, and other government funders, as well as private foundation funders. Yet evidence and innovation continue to play an extremely small role in Title I, Title II, special education, and other federal programs, much less in state and local programs. A movement toward giving schools and districts more freedom in choosing how to use federal funding is a positive development, but local educators will need reliable information about proven, replicable programs to translate their new freedom into solid benefits for their children. Innovation based on research and development is what America does best. Isn’t it time to dedicate ourselves to innovating our way to solutions of our longstanding educational problems?

How Biased Measures Lead to False Conclusions

2014-11-06-HP49_Research_11_6_2014.jpg

One hopeful development in evidence-based reform in education is the improvement in the quality of evaluations of educational programs. Because of policies and funding provided by the Institute of Education Sciences (IES) and Investing in Innovation (i3) in the U.S. and by the Education Endowment Foundation (EEF) in the U.K., most evaluations of educational programs today use far better procedures than was true as recently as five years ago. Experiments are likely to be large, to use random assignment or careful matching, and to be carried out by third-party evaluators, all of which give (or should give) educators and policy makers greater confidence that evaluations are unbiased and that their findings are meaningful.

Despite these positive developments, there remain serious problems in some evaluations. One of these relates to measures that give the experimental group an unfair advantage.

There are several ways in which measures can unfairly favor the experimental group. The most common is where measures are made by the creator of the program and are precisely aligned with the curriculum taught in the experimental group but not the control group. For example, a developer might reason that a new curriculum represents what students should be taught in, say, science or math, so it’s all right to use a measure aligned with the experimental program. However, use of such measures gives a huge advantage to the experimental group. In an article published in the Journal of Research on Educational Effectiveness, Nancy Madden and I looked at effect sizes for such over-aligned measures among studies accepted by the What Works Clearinghouse (WWC). In reading, we found an average effect size of +0.51 for over-aligned measures, compared to an average of +0.06 for measures that were fair to the content taught in experimental and control groups. In math, the difference was +0.45 for over-aligned measures, -0.03 for fair ones. These are huge differences.

A special case of over-aligned measures takes place when content is introduced earlier than usual in students’ progression through school in the experimental group, but not the control group. For example, if students are taught first-grade math skills in kindergarten, they will of course do better on a first grade test (in kindergarten) than will students not taught these skills in kindergarten. But will the students still be better off by the end of first grade, when all have been taught first grade skills? It’s unlikely.

One more special case of over-alignment takes place in relatively brief studies when students are pre-tested, taught a given topic, and then post-tested, say, eight weeks later. The control group, however, might have been taught that topic earlier or later than that eight-week period, or might have spent much less than 8 weeks on it. In a recent review of elementary science programs, we found many examples of this, including situations in which experimental groups were taught a topic such as electricity during an experiment, while the control group was not taught about electricity at all during that period. Not surprisingly, these studies produce very large but meaningless effect sizes.

As evidence becomes more important in educational policy and practice, we researchers need to get our own house in order. Insisting on the use of measures that are not biased in favor of experimental groups is a major necessity in building a body of evidence that educators can rely on.

Are Proven Educational Innovations Ready for Prime Time?

These are good times for evidence-based reform in education. Due in particular to Investing in Innovation (i3) and the Institute of Education Sciences (IES), the number of proven programs in all subjects and grade levels in increasing, and as i3 programs come to the ends of their evaluations, the number of proven programs should accelerate (even though there are also sure to be many disappointments).

The increasing numbers of programs proven effective in rigorous research creates new opportunities for policy and practice in education. Already, School Improvement Grants (SIG) are newly introducing an option for schools to choose proven, comprehensive reform models. Other areas of policy may also soon begin to encourage or incentivize use of programs with strong evidence.

If these changes in policy begin to happen, it will begin to matter whether educational programs proven to be effective are in fact ready for prime time, meaning that they are ready to be disseminated and supported in the form they existed when they were successfully evaluated. It would be catastrophic if educators and policy makers began looking on the What Works Clearinghouse, for example, or looking for programs that meet the EDGAR standards for strong or moderate evidence of effectiveness, and found many program that were unavailable, or unrealistic, or impractical.

Much as providing evidence of effectiveness is an advance in education, there is a real need for a determination of the degree to which programs are also ready for widespread implementation.

Some indicators of readiness for prime time would be easy to assess. For example, programs that lack a web site, do not offer materials or training, or otherwise do not exist in anything like the form in which they were evaluated cannot be considered ready for implementation. Some programs used procedures in their evaluation that could never be replicated, such as science programs that provide each experimental class with enough graduate students to monitor and assist every lab group. Some proven technology products run on hardware that no longer exists.

Many studies use measures of learning outcomes that are closely aligned with what was taught in the experimental but not the control group. Such studies might be excluded on the basis that the overaligned measure does not have meaning beyond the experiment itself.

Educators who choose to use proven programs have a right to be confident that the programs they have selected are, if implemented well, likely to result in enhanced performance on measures they care about. Finding a lot of programs that cannot be implemented under ordinary circumstances and with meaningful measures will diminish interest in evidence-based reform.

Evidence-based reform itself is ready for prime time in education, but its future depends on whether it is perceived to produce genuine benefits for children. We need to make sure that the proven programs we offer to educators meet their needs and those of the students, not just scientific standards.

Promoting Proven Programs in Title I: The Bully Pulpit

2014-10-23-HP51_TRoosevelt_sm.jpg

Title I, the 800-pound gorilla of federal education policy, spends $15 billion a year to help high-poverty schools enhance outcomes for their students. The greatest victory for evidence-based reform would be for the roughly 51,500 Title I schools to make far greater use of programs known to enhance student learning, benefitting millions of at-risk children throughout the U.S. Yet because Title I is a formula grant, it is difficult for federal policy to increase use of proven approaches. Competitive grants can provide preference points for using proven models, as I’ve argued before, but in formula grants, it is more difficult to nudge educational leaders toward use of proven programs, since they will receive their money as long as they follow the rules.

One solution to this problem might be to borrow President Theodore Roosevelt’s conception of the presidency as a “bully pulpit.” In other words, even during a time of congressional gridlock, it is possible for the administration to promote the use of proven approaches, even in formula grants, at little or no cost.

The first thing the U.S. Department of Education would have to do is to review all the programs in the What Works Clearinghouse according to the simpler, clearer standards in the EDGAR regulations. Someone would then have to prune the resulting lists of programs, identifying programs that meet the EDGAR standards for “strong” and “moderate” levels of evidence to remove programs that no longer exist or that do not have anyone providing training and materials similar to those provided in the successful studies. The remaining programs would represent a good starting list of programs that, if implemented well, would be likely to have positive impacts on student achievement.

Department officials could then publicize this list in many ways. Certainly, they could create a web site showing the programs and the evidence behind them and linking to the programs’ web sites. They might sponsor “effective methods fairs” around the U.S. to demonstrate programs available for schools and districts to choose. They might distribute certificates to schools that adopt proven programs and then implement them with fidelity, as certified by the developers.

These strategies and others could arouse widespread interest in proven programs, and help school leaders make a wide array of choices of programs appropriate to their needs.

If funds became available, the Department might provide modest incentive grants to help schools supplement the start-up costs of proven programs. But even without special incentive funding, schools should be able to make choices from among programs known to be likely to help them succeed with their children, using their existing Title I funds.

A creative “bully pulpit” policy might begin a process of expanding use of existing proven programs, encouraging creation and evaluation of new ones, and increasing sophistication in choosing how to spend federal resources. All of this could be accomplished for nearly nothing, while gradually moving the $15 billion in Title I toward more effective uses. Over time, such a policy would also encourage developers and researchers to create and evaluate programs likely to meet EDGAR standards, and it could help build political support for investments in R&D that ultimately result in better outcomes for children on a broad scale.

A “bully pulpit” strategy would still need to be accompanied by policies of providing incentives to adopt proven programs in competitive grants, and with continued support for the R&D pipeline, such as that provided by Investing in Innovation (i3), the Institute of Education Sciences (IES), and the National Science Foundation (NSF). However, development and research in education have to go beyond R&D; they need to be seen as a routine, growing part of the world of educational practice and innovation.

*Photo courtesy of the Library of Congress

Use Proven Programs or Manage Using Data? Two Approaches to Evidence-Based Reform

2014-05-15-Huff35image051514.jpg

In last week’s blog, I welcomed the good work Results for America (RfA) is doing to promote policy support for evidence-based reform. In reality, RfA is promoting two quite different approaches to evidence-based reform, both equally valid but worth discussing separately.

The first evidence-based strategy is “use proven programs (UPP),” which is what most academics and reformers think of when they think of evidence-based reform. This strategy depends on creation, evaluation, and widespread dissemination of proven programs capable of, for example, teaching beginning reading, algebra, or biology better than current methods do. If you see “programs” as being like the individual baseball players that Oakland A’s manager Billy Beane chose based on statistics, then RfA’s “Moneyball” campaign could be seen as consistent in part with the “use proven programs” approach.

The second strategy might be called “manage using data (MUD).” The idea is for leaders of complex organizations, such as mayors or superintendents, to use data systematically to identify and understand problems and then to test out the solutions, expanding solutions that work and scaling back or abandoning others. This is the approach used in “Geek Cities“ celebrated by RfA.

“Use proven programs” and “manage using data” have many similarities, of course. Both emphasize hard-headed, sophisticated use of data. Advocates of both approaches would be comfortable with the adage, “In God we trust. All others bring data.”

However, there are key differences between the UPP and MUD approaches that have important consequences for policy and practice. UPP emphasizes the creation of relatively universal solutions to widespread problems. In this, they draw from a deep well of experience in medicine, agriculture, technology, and other fields. When an innovator develops a new heart valve, a cow that produces more milk, or a new cell phone, and proves that it produces better outcomes than current solutions, that solution can be used with confidence in a broad range of circumstances, and may have immediate and profound impacts on practice.

In contrast, when a given school district succeeds with the MUD approach (for example, analyzing where school violence is concentrated, placing additional security guards in those areas, and then noting the changes in violence), this success is likely to be valued and acted upon by district leaders, because the data come from a context they understand and are collected by people they employ and trust. However, the success is unlikely to spread to or be easily replicated by other school districts. The MUD district may tout its success, but district leaders are not particularly motivated to tell outsiders about their successes, and usually lack sufficient staff to even write them up. Further, since MUD approaches are not designed for replication, they may or may not work in other places with different contexts. The difficulty of replicating success in a different context also applies to UPP strategies, but after several program evaluations in different contexts, program developers are likely to be able to say where their approach is most and least likely to work.

From a policy perspective, MUD and UPP approaches can and should work together. A district, city, or state that proactively uses data to analyze all aspects of its own functioning and to test out its own innovations or variations in services should also be eager to adopt programs proven to be effective elsewhere, perhaps doing their own evaluations and/or adaptations to local circumstances. If the bottom line is what’s best for children, for example, then a mix of solutions “made and proven here” and those “made and proven elsewhere and replicated or tested here” seems optimal.

For federal policy, however, the two approaches lead in somewhat different directions. The federal government cannot do a great deal to encourage local governments to use their own data wisely. In areas such as education, federal and state governments use accountability schemes of various kinds as a means of motivating districts to use data-driven management, but looking at NAEP scores since accountability took off in the early 1980s, this strategy is not going very well. The federal government could identify well-defined, proven, and replicable “manage using data” methods, but if it did, those MUD models would just become a special case of “use proven programs” (and in fact, the great majority of education programs proven to work use data-driven management in some form as part of their approach).

In contrast, the federal government can do a great deal to promote “use proven programs.” In education, of course, it is doing so with Investing in Innovation (i3), the What Works Clearinghouse (WWC), and most of the research at the Institute of Education Sciences (IES). All of these are building up the number of proven programs ready to be used broadly, and some are helping programs to start or accelerate scale-up. The existence of the proven programs coming from these and other sources creates enormous potential, but is not yet having much impact on federal policies relating, for example, to Title I or School Improvement Grants, but this could be coming.

Ultimately, “use proven programs” and “manage using data” should become a seamless whole, using every tool of policy and practice to see that children are succeeding in school, whatever that takes. But the federal government is wisely taking the lead in building up capacity for replicable “use proven programs” strategies to provide new visions and practical guides to help schools improve themselves.

School Improvement Grants Embrace Evidence of Effectiveness

2014-02-20-HPImage022014.jpg

Despite all of the exciting gains made by evidence-based reform in recent years, all of the progress so far has been limited to development, evaluation, and scale-up of proven programs outside of mainline education policy or funding. Title I, Title II, Race to the Top, School Improvement Grants, and other large, influential funding sources for reform have hardly been touched by the growth of proven, replicable programs sponsored primarily by the Institute of Education Sciences and Investing in Innovation (i3). Until the evidence movement crosses over from R&D to the real world of policy and practice, it will remain the domain of academics and policy wonks, not a real force for change.

In the recently passed Omnibus budget, however, appears a first modest step over the R&D/policy divide. This is a new provision in congressional authorization of School Improvement Grants (SIG). Up until now, SIG schools (ones that have suffered from very low achievement levels for many years) had to choose among four models, all of which require major changes in staffing. Each SIG school is expected to develop its own model of reform, usually with the help of consultants. The problem has been that each of the hundreds of schools receiving (substantial) SIG funding has to create its own never-before-tested path to reform, and then try to implement it with quality in a school that has just experienced a substantial turnover of its leadership and staff.

The “Fifth Option” recently introduced by Congress adds a new alternative. SIG schools can choose to adopt a “proven whole-school reform model” that meets at least a moderate level of evidence support, which includes having been tested against a control group in at least one rigorous experiment. The fifth option will let schools keep their leaders and staffs, but adopt a schoolwide approach that has been used in many similar schools and found to be effective.

The Omnibus bill was passed too late in the year to apply this fifth option to the 2014-2015 school year, and the U. S. Department of Education, as well as individual states, have a lot of work to do to prepare new regulations and supports for schools applying for SIG funds under this new option in 2015-2016.

However, the fifth option makes an important statement that has not been made previously. In a major school improvement (not R&D) funding program, the fifth option says “use what works.” Wisely, it does not mandate the use of any specific programs, but by highlighting evidence-proven approaches, it puts the government behind the idea that federal funding should whenever possible be used to help educators use programs with strong evidence of effectiveness. This could be the start of something beautiful.

Success in Evidence-Based Reform: The Importance of Failure

As always, Winston Churchill said it best: “Success consists of going from failure to failure without loss of enthusiasm.” There is a similar Japanese saying: “Success is being knocked down seven times and getting up eight.”

These quotes came to my mind while I was reading a recently released report from the Aspen Institute, “Leveraging Learning: The Evolving Role of Federal Policy in Education Research.” The report is a useful scan of the education research horizon, intended as background for the upcoming reauthorization of the Education Sciences Reform Act (ESRA), the legislation that authorizes the Institute of Education Sciences (IES). However, the report also contains brief chapters by various policy observers (including myself), focusing on how research might better inform and improve practice and outcomes in education. A common point of departure in some of these was that while randomized experiments (RCTs) emphasized for the past decade by IES and, more recently, Investing in Innovation (i3), are all well and good, the IES experience is that most randomized experiments evaluating educational programs find few achievement effects. Several cited testimony by Jon Baron that “of the 90 interventions evaluated in randomized trials by IES, 90% were found to have weak or no positive effects.” As a response, the chapter authors proposed various ways in which IES could add to its portfolio more research that is not RCTs.

Within the next year or two, the problem Baron was reporting will take on a great deal of importance. The results of the first cohort of Investing in Innovation grants will start being released. At the same time, additional IES reports will appear, and the Education Endowment Foundation (EEF) in the U.K., much like i3, will also begin to report outcomes. All four of the first cohort of scale-up programs funded by i3 (our Success for All programReading RecoveryTeach for America, and KIPP) have had positive first-year findings in i3 or similar evaluations recently, but this is not surprising, as they had to pass a high evidence bar to get scale-up funding in the first place. The much larger number of validation and development projects were not required to have such strong research bases, and many of these are sure to show no effects on achievement. Kevan Collins, Director of the EEF, has always openly said that he’d be delighted if 10% of the studies EEF has funded find positive impacts. Perhaps in the country of Churchill, Collins is better placed to warn his countrymen that success in evidence-based reform is going to require some blood, sweat, toil, and tears.

In the U.S., I’m not sure if policymakers or educators are ready for what is about to happen. If most i3 validation and development projects fail to produce significant positive effects in rigorous, well-conducted evaluations, will opinion leaders celebrate the programs that do show good outcomes and value the knowledge gained from the whole process, including knowledge about what almost worked and what to avoid doing next time? Will they support additional funding for projects that take these learnings into account? Or will they declare the i3 program a failure and move on to the next set of untried policies and practices?

I very much hope that i3 or successor programs will stay the course, insisting on randomized experiments and building on what has been learned. Even if only 10% of validation and development projects report clear, positive achievement outcomes and capacity to go to scale, there will be many reasons to celebrate and stay on track:

1. There are currently 112 i3 validation and development projects (plus 5 scale-ups). If just 10% of these were found to be effective and scalable, that would be 11 new programs. Adding this to the scale-up programs and other programs already positively reviewed in the What Works Clearinghouse, this would be a substantial base of proven programs. In medicine, the great majority of treatments initially evaluated are found not to be effective, yet the medical system of innovation works because the few proven approaches make such a big difference. Failure is fine if it leads to success.

2. Among the programs that do not produce statistically significant positive outcomes on achievement measures, there are sure to be many that show promise but do not quite reach significance. For example, any program whose evaluation shows a student-level positive effect size of, say, +0.15 or more should be worthy of additional investment to refine and improve its procedures and its evaluation to reach a higher standard, rather than being considered a bust.

3. The i3 process is producing a great deal of information about what works and what does not, what gets implemented and what does not, and the match between schools’ needs and programs’ approaches. These learnings should contribute to improvements in new programs, to revisions of existing programs, and to the policies applied by i3, IES, and other funders.

4. As the findings of the i3 and IES evaluations become known, program developers, grant reviewers, and government leaders should get smarter about what kinds of approaches are likely to work and to go to scale. Because of this, one might imagine that even if only 10% of validation and development programs succeed in RCTs today, higher and higher proportions will succeed in such studies in the future.

Evidence-based reform, in which promising scalable approaches are ultimately evaluated in RCTs or similarly rigorous evaluations, is the best way to create substantial and lasting improvements in student achievement. Failures of individual evaluations or projects are an expected, even valued part of the process of research-based reform. We need to be prepared for them, and to celebrate the successes and the learnings along the way.

As Churchill also said, “Success is not final, failure is not fatal; It is the courage to continue that counts.”