The Evidence or The Morgue

Many years ago, when I was a special education teacher, I had a summer job at a residential school for emotionally disturbed children. The school happened to be located in a former tuberculosis sanitarium. Later, I heard from other teachers elsewhere about having worked in schools in one-time sanitaria as well.

How did it come about, one might ask, that tuberculosis sanitaria across the country became available for use as schools? The answer is that researchers cured the disease. The sanitaria were no longer needed for their original purpose, so they were turned into schools.

One feature of the former sanitaria is that they all had morgues. We used ours to store curriculum materials, because it had very sturdy and useful sliding horizontal cabinets. This arrangement led to a certain amount of macabre humor, but the morgue reminded us that what the sanitaria had once done was deadly serious indeed.

I was recalling my summer in the sanitarium after reading about the latest developments in the reauthorization of ESEA. Both the House and the Senate have now passed bills that eliminate the Investing in Innovation (i3) program and cut funding for the Institute of Education Sciences. In their place, the bills have a lot of language about state and local control, and about identifying and publicizing individual schools that are doing a particularly good job so their good works can help inspire and influence other schools. None of this would bother me if the legislation contained a clear commitment to rigorous research, development, and dissemination, but this may or may not be the case.

The Senate bill, which passed with bipartisan support last week, does authorize an evidence-based innovation fund. Modeled on the successful Small Business Innovation Research (SBIR) program, which funds innovation and evaluation in 11 different government agencies, this initiative would provide flexible funding for a broad range of field-driven projects and allow schools, districts, non-profits, and small businesses to develop and grow innovative programs to improve student achievement. Grants would be awarded using a tiered evidence framework based on an applicant’s proven effectiveness. The provision was initially offered and accepted as a bipartisan amendment during the Senate HELP Committee markup of its bill. However, the House bill has no comparable provision, and I have to wonder if the Senate provision will survive the grueling conference process and make it into the final bill.

Try to imagine what would have happened if tuberculosis research had been treated the way education research has been treated in the House version of the ESEA reauthorization bill. Individual sanitaria with lower death rates might be recognized. States and localities might try out ideas to make the sanitaria more effective, but few if any states or localities would be large enough to do the necessary sustained R & D. “Best practices” would be constrained by the current system, so they might involve better ways for sanitarium staff to do exercises with patients, for example, rather than experimenting with medications or other treatments. The disease would never have been cured. The morgues would still be used for unfortunate patients, not for curriculum materials.

The U.S. spends hundreds of billions of dollars every year on education. What student, parent, teacher, administrator, or policy maker does not want those billions used to make as much of a difference as possible? The pursuit of knowledge about how to improve educational outcomes is obviously important, but it is rarely very high on anyone’s priority list.

Fortunately, medicine and other fields long ago decided that research was in the national interest, and that investments in research were the most reliable way forward in improving important outcomes. In medicine, the choice is stark: either the evidence prevails or the morgue does. Yet in education, anyone with eyes to see knows what happens when children fail to learn. Most of the children who cannot read end up unemployed. Many end up in prison, and all too many in the morgue. We know enough now to be able to say that the great majority of reading failure, for example, is preventable. Yet we choose not to prevent it. What does this say about us as a people, as a society, as a political system?

I hope our leaders in Congress approve the Senate language on evidence, or something similar, and reinstate and fund programs that have the greatest promise in identifying and disseminating effective approaches to key problems. The lives of a generation of vulnerable children depend on their wisdom and courage at this critical juncture.


Good Failure/Bad Failure

Evidence junkies (like me) are reacting to the disappointing news on the evaluation of the Adolescent Behavioral Learning Experience (ABLE), a program implemented at Rikers Island to reduce recidivism among adolescent prisoners. Bottom line: The rigorous independent evaluation of the program failed to find any benefits. What makes this experiment especially interesting is that it is the first U.S. application of social impact bonds. Goldman Sachs put up a $7.2 million loan, and Bloomberg Philanthropies committed to a $6 million loan guarantee. Since the program did not produce the expected outcomes, Goldman Sachs lost $1.2 million.

Ironically, New York City administrators are delighted about the outcome because they do not have to pay for the program. They think they learned a great deal from the experience, for free.

It’s unclear what this will do to the social impact bond movement, currently in its infancy. However, I wanted to extend from this fascinating case to a broader issue in evidence-based reform.

The developers and advocates for the ABLE program who expected positive outcomes turned out to be wrong, at least in this implementation. The investors were wrong in expecting to make a profit. But I’d argue that they are all better off because of this experience, just as the N.Y.C. administrators said.

The distinction I want to make is between wrong and wrong-headed. Wrong, as I’m defining it in this context, means that a given outcome was not achieved, but it was entirely reasonable to expect that it might have been achieved. In contrast, wrong-headed means that not only was the desired outcome not achieved, but it was extremely unlikely that it could have been achieved. In many cases, a key component of wrong-headed actions is that the actor does not even know whether the action was effective or ineffective, right or wrong, and therefore continues with the same or similar actions indefinitely.

Wrong, I’d argue, is an honorable and useful outcome. In a recent interview, former White House advisor Gene Sperling noted that when a few cancer drugs fail to cure cancer, you don’t close down NIH. Instead, you take that information and use it to continue the research and development process. “Wrong,” in this view, can be defined as “good failure,” because it is a step on the path to progress.

“Wrong-headed,” on the other hand, is “bad failure.” When you do something wrong-headed, you learn nothing, or you learn the wrong lessons. Wrong-headed decisions tend to lead to more wrong-headed decisions, as you have no systematic guide to what is working and what is not.

The issue of wrong vs. wrong-headed comes up in the current discussions in Congress about continuing the Investing in Innovation (i3) program. By now, committees in both the House and the Senate have recommended ending i3. But this would be the very essence of wrong-headed policy. Sure, it is probable that many i3 programs funded so far will fail to make a difference in achievement, or will fail to go to scale. This just means that these programs have not yet found success. Some of these may still have evidence of promise, and some will not. However, all i3 programs are rigorously evaluated, so we will know a lot about which worked, which did not, and which still seem promising even if they did not work this time. That’s huge progress. The programs that are already showing success can have immediate impact in hundreds or thousands of schools while others greatly enrich understanding of what needs to be done.

Abandoning i3, in contrast, would be wrong-headed, a sure path to bad failure. A tiny slice of education funding, i3 tells us what works and what does not, so we can continually move towards effective strategies and policies. Without i3 and other research and development investments, education policy is just guesswork, and it gets no smarter over time.

No one can honestly argue that American education is as successful as it should be. Our kids, our economy, and our society deserve much better. Policies that seek a mixture of proven success and “good failure” will get us to solid advances in educational practice and policy. Abandoning or cutting programs like i3 is not just wrong. It’s wrong-headed.

Evidence at Risk in House Spending Bill

The House Appropriations Committee marked up its spending bill yesterday for fiscal year 2016 for the Departments of Labor, Health and Human Services and Education. The spending levels in the bill put forward by the majority reduce Department of Education funding by $2.8 billion, mostly by eliminating approximately two dozen programs and severely cutting back several others, so it is no surprise that the bill passed through the Committee along party lines.

I can’t speak for all of the affected programs, but I do want to address what some of these proposed cuts could do. In a word, they would devastate the movement toward evidence as a basis for policy and practice in education.

First, the House bill would eliminate Investing in Innovation (i3). i3 has been the flagship for “tiered evidence” initiatives, providing large scale-up grants for programs that already have substantial evidence of effectiveness, smaller “validation” grants for programs with some evidence to build up their evidence base, and much smaller “development” grants for programs worth developing, piloting, and evaluating. At $120 million per year, i3 costs about 50¢ per taxpayer. What we get for 50¢ per year is a wide variety of promising programs at all grade levels and in all subjects, serving thousands of mostly high-poverty schools nationwide. We get evidence on the effectiveness of these programs, which tells us which are ready for broader use in our schools. The evidence from i3 informs the whole $630-billion public education enterprise, especially the $15-billion Title I program. That is, i3 costs 2¢ for every $100 spent on public education. Congresswoman Chellie Pingree of Maine offered an amendment to restore i3 and increase its funding level to $300 million, which is what the president had proposed. The process of offering the amendment gave members the opportunity to discuss the importance of i3, but in the end it was withdrawn (a not-uncommon procedural move when the amendment does not have an offset and/or is not expected to pass).

Second, the House proposal would significantly reduce funding for the Institute of Education Sciences (IES). IES commissions a wide variety of educational research, data collection, communications about evidence, and standard-setting for evidence, at a very modest cost. In this case, Congressman Mike Honda of California offered and withdrew an amendment to restore IES funding to its FY15 level of $574 million.

Finally, the main target of the proposed cuts was discretionary programs, which provide direct services to students. Districts, states, and other entities have to apply for these pots of money (as distinct from funds such as Title I or IDEA that are distributed by formula). Examples include Striving Readers (for struggling secondary readers); School Improvement Grants, or SIG (for low-performing schools); Preschool Development Grants; Mathematics and Science Partnerships; Ready to Learn (educational television); and several others.

These discretionary funding sources are the programs that could most easily be focused on evidence. One practical example is SIG, which recently added a category of approved expenditures consisting of whole-school reform programs with at least moderate evidence of effectiveness, which includes having been tested against a control group in at least one rigorous experiment. As another example, Title II SEED grants for professional development now require that programs adopted under SEED funding have at least moderate evidence of effectiveness. Congresswoman Rosa DeLauro offered an amendment to reinstate many of these programs, and it failed along party lines.

Adding evidence as a requirement or encouraging use of proven programs is much easier with discretionary programs than with formula grants. Yet if the House bill were to become law, there would be very few discretionary programs left.

The House proposal would greatly reduce national capacity to find out what works and what does not, and to scale up proven programs and practices. I very much hope our leaders in Congress will rethink this strategy and retain funding for the government programs mostly likely to help all of us learn — policy makers, educators, and kids alike.

Evidence, Brown, and the Civil Rights Act


2014 is the anniversary of two great milestones in American history: Brown vs. Board of Education (1954) and the Civil Rights Act (1964). I was too young to remember the first, but I remember exactly where I was when I heard that the Civil Rights Act had passed. I was 13, working as a volunteer in a giant orphanage in Washington, DC, called Junior Village. The kids, hundreds of them from babies to teens, were all African American, and so was most of the staff, plus a few liberal whites, so the news was greeted with euphoria. That summer changed my life.

Many people are writing to commemorate these great events, always with a question of how far we’ve really come toward the fairness and equality promised both by Brown and by the Civil Rights Act. Anyone with eyes to see has to acknowledge the progress that has taken place, but also the huge inequities that still remain.

I won’t add to the half-hurrahs being widely offered. During the time since Brown and the Civil Rights Act, we, the greatest nation on Earth, rocketed to the moon, cured many diseases, led astonishing developments in technology, defeated the Soviets, and on and on. And yet we still struggle to solve the most basic issues of equality between racial and ethnic groups: employment, education, health, and more. If inequality were merely a technical problem, we would have solved it. But it’s a problem of will, and therefore we consider the unacceptable acceptable. For shame.

In my own field, education, the “gap” between white students and African American and Hispanic children is always decried but never solved. It has remained about the same since 1980. Could we solve it? Could anyone doubt that the greatest nation on the planet could solve such a problem if it wanted to?

If we were truly committed to solving this problem, here is what we’d do. First, we’d identify all of the problems holding back minority students. Then we’d put in place solutions already known to be effective. We’d then commission research and development on the scale of the Manhattan Project to find effective, replicable solutions to the remaining problems. As approaches are validated in rigorous evaluations, we’d put them into practice in all schools that need them. We’d do the same in public health, mental health, social services, juvenile justice, employment, housing, and every other area that affects children and families. If America decided to do these things, it would succeed. There is no doubt. But did you notice the word “if” at the beginning of this paragraph?

America is an incredibly wealthy and capable country. Just as one example, we spent more than $2 trillion on the Iraq war. It did not even cause taxes to go up. We could have spent that much to combat inequality. We still could, and it would actually cost far less. But we have, thus far at least, chosen not to.

Even in dysfunctional Washington, we can still make progress in learning how to use the funds we already have committed to education and other services more effectively. Progressives and conservatives share an interest in using federal funds efficiently, and bipartisan alliances are coalescing to find out what works and use that information to make good policy choices that may eventually reduce achievement gaps. That’s the realistic grown-up me talking. The hopeful 13-year-old who celebrated the Civil Rights Act has confronted reality.

But can anyone explain to me why we shouldn’t be achieving what everyone knows can be achieved to bring about true equality and opportunity for all?

Wink Wink Nudge Nudge: Use Proven Programs


Discussions about evidence-based reform in education often founder on the question of financing. Who is going to provide the funding necessary to help schools adopt proven programs?

There are some areas in which additional funding might be necessary, which I’ll address in a moment. But the most important federal action to promote adoption of proven programs is to encourage schools and districts to use existing federal funding in this way. In particular, leaders of high-poverty schools constantly sniff the wind to be aware of what the feds want them to do. If the feds say “Wink wink nudge nudge, use proven programs,” high-poverty schools will get a lot more interested in what these are and how they can get ahold of them. Beyond winks and nudges, the feds and state leaders can offer a few competitive preference points to proposals that promise to adopt and effectively implement proven programs. Again, if government is handing out discretionary money based on local proposals, it costs nothing to nudge these proposals toward proven programs.

Of course, government funding is needed to help develop and evaluate promising programs, and sometimes to provide incentive or start-up funding to help schools or districts adopt them. But the really big money, like Title I and Race to the Top and School Improvement Grants, is what should be phasing toward support for proven approaches, and with a wink, a nudge, some competitive preference points, and an ever-growing set of proven programs from which schools may choose, our system of schooling can become far more effective over time, especially for the struggling schools that receive the largest amounts of federal funding.

A Purpose-Driven Government: Moneyball and Impact


In 2002, Billy Beane, General Manager of the Oakland A’s, created “Moneyball,” a method of using statistics to evaluate the performance of ball players and hire the most productive players for the least money. John Bridgeland and Peter Orszag wonder if government can equally play “moneyball.” They made similar arguments in an article inThe Atlantic over the summer. Bridgeland and Orszag were, respectively, top officials in the G. W. Bush and early Obama administrations’ Office of Management and Budget (OMB), so they write from deep experience. In a nutshell, their point is that evidence of effectiveness matters very little in government funding, but should matter much more. Readers of this column will not be surprised to hear me applaud this position.

However, I am concerned about one aspect of their articles, the notion that evaluations of federal funding streams are needed so that ineffective ones can be terminated. Sometimes this is true, but in reality most federal social funding supports investments in purposes that have a value for most Americans. Specific programs are then funded within these streams to accomplish the goals represented by that value. In a time when many politicians are looking for things to cut to reduce taxes or deficits, it is dangerous to put everything on the evaluation operating table. Individual programs (such as “Scared Straight,” the delinquency prevention program proven to increase delinquency) can and should be evaluated and cut if they are ineffective. However, “reducing delinquency” is a valid purpose that is a worthwhile investment of federal dollars. This purpose should not have to meet an evidence standard in the same way “Scared Straight” or all other specific delinquency prevention programs should.

In an earlier blog, I discussed this distinction, contrasting Title I (a funding stream for the widely shared value of improving achievement in high-poverty schools) and specific uses of Title I funding (such as specific after-school tutoring programs). My argument was that a “moneyball” approach, in which rigorous research is used to determine the impacts of specific Title I expenditures was necessary, feasible, and in an ideal world acceptable to politicians and taxpayers of all stripes, who share an interest in cost-effective government. However, Title I itself is a funding stream, not a specific program, and it exists because most Americans agree that schools serving children in poverty need extra help. Title I has specific rules and procedures, but at its core it provides funding for a valid purpose that probably will always exist. For this reason, giant evaluations of Title I are not valuable. Instead, we need substantial investment in development, evaluation, and dissemination of specific approaches proven to actually fulfill the good intentions of Title I funding.

The same logic would apply to all government social programs. Each has a set of outcomes that are worthy of federal investment: reducing delinquency, hunger, homelessness, unemployment, teen pregnancy, and so on. Few would argue that these goals are unimportant, and a great, modern nation should surely be investing in them. But in each case, that investment goes into many specific programs, and the effectiveness of these programs ultimately adds up to the effectiveness of the funding stream. Further, improving the evidence base for proven individual programs (and weeding out ineffective ones) is uniquely a federal role, which states and localities are unlikely to do very well. When federal R&D identifies effective social programs, that information potentially magnifies the effectiveness of social spending at all levels of government.

In the case of the original “Moneyball,” evaluating the whole of the Oakland A’s would not have made much difference; they were already being evaluated (harshly) in their league standings. Yet using statistical evaluations of current and potential players to improve the quality of players on their roster gradually improved their overall performance. In the same way, social programs need to build up rosters of proven programs to improve their overall outcomes. A purpose-driven government does not cut funding for valid purposes if they are not being adequately attained. Instead, it improves the programs that help achieve widely supported purposes, using good science and scaling up effective innovations.

How To Do Lots of High-Quality Educational Evaluations for Peanuts


One of the greatest impediments to evidence-based reform in education is the difficulty and expense of doing large-scale randomized experiments. These are essential for several reasons. Large-scale experiments are important because when treatments are at the level of classrooms and schools, you need a lot of classrooms and schools to avoid having just a few unusual sites influence the results too much. Also, research finds that small-scale studies produce inflated effects, particularly because researchers can create special conditions on a small scale that they could not sustain on a large scale. Large experiments simulate the situations likely to exist when programs are used in the real world, not under optimal, hothouse conditions.

Randomized experiments are important because when schools or classrooms are assigned at random to use a program or continue to serve as a control group (doing what they were doing before), we can be confident that there are no special factors that favor the experimental group other than the program itself. Non-randomized, matched studies that are well designed can also be valid, but they have more potential for bias.

Most quantitative researchers would agree that large-scale randomized studies are preferable, but in the real world such studies done well can cost a lot – more than $10 million per study in some cases. That may be chump change in medicine, but in education, we can’t afford many such studies.

How could we do high-quality studies far less expensively? The answer is to attach studies to funding being offered by the U. S. Department of Education. That is, when the Department is about to hand out a lot of money, it should commission large-scale randomized studies to evaluate specific ways of spending these resources.

To understand what I’m proposing, consider what the Department might have done when No Child Left Behind (NCLB) required that low-performing schools offer after-school tutoring to low-achieving students, in its Supplemental Educational Services (SES) initiative. The Department might have invited proposals from established providers of tutoring services, which would have had to participate in research as a condition of special funding. It might have then chosen a set of after-school tutoring providers (I’m making these up):

Program A provides structured one-to-three tutoring.
Program B rotates children through computer, small-group, and individualized activities.
Program C provides computer-assisted instruction.
Program D offers small-group tutoring in which children who make progress get treats or free time for sports.

Now imagine that for each program, 60 qualifying schools were recruited for the studies. For the Program A study, half get Program A and half get the same funding to do whatever they wanted to do (except Programs A to D) consistent with the national rules. The assignment to Program A or its control group would be at random. Program B, C, and D would be evaluated in the same way.

Here’s why such a study would have cost peanuts. The costs of offering the program to the schools that got Programs A, B, C, or D would have been covered by Title I, as was true of all NCLB after-school tutoring programs. Further, state achievement tests, routinely collected in every state in grades 3-8, could have been obtained at pre- and posttest at little cost for data collection. The only costs would be for data management, analysis, and reporting, plus some amount of questionnaires and/or observations to see what was actually happening in the participating classes. Peanuts.

Any time money is going out from the Department, such designs might be used. For example, in recent years a lot of money has been spent on School Improvement Grants (SIG), now called School Turnaround Grants. Imagine that various whole-school reform models were invited to work with many of the very low-achieving schools that received SIG grants. Schools would have been assigned at random to use Programs A, B, C, or D, or to control groups able to use the same amount of money however they wished. Again, various models could be evaluated. The costs of implementing the programs would have been provided by SIG (which was going to spend this money anyway), and the cost of data collection would have been minimal because test scores and graduation rates already being collected could have been used. Again, the costs of this evaluation would have just involved data management, analysis, and reporting. More peanuts.

Note that in such evaluations, no school gets nothing. All of them get the money. Only schools that want to sign up for the studies would be randomly assigned. Modest incentives might be necessary to get schools to participate in the research, such as a few competitive preference points in competitive proposals (such as SIG) or somewhat higher funding levels in formula grants (such as after-school tutoring). Schools that do not want to participate in the research could do what they would have done if the study had never existed.

Against the minimal cost, however, weigh the potential gain. Each U. S. Department of Education program that lends itself to this type of evaluation would produce information about how the funds could best be used. Over time, not only would we learn about specific effective programs, we’d also learn about types of programs most likely to work. Also, additional promising programs could enter into the evaluation over time, ultimately expanding the range of options for schools. Funding from the Institute of Education Sciences (IES) or Investing in Innovation (i3) might be used specifically to build up the set of promising programs for use in such federal programs and evaluations.

Ideally, the Department might continuously commission evaluations of this kind alongside any funding it provides for schools to adopt programs capable of being evaluated on existing measures. Perhaps the Department might designate an evaluation expert to sit in on early meetings to identify such opportunities, or perhaps it might fund an external “Center for Cost-Effective Evaluations in Education.”

There are many circumstances in which expensive evaluations of promising programs still need to be done, but constantly doing inexpensive studies where they are feasible might free up resources to do necessarily expensive research and development. It might also accelerate the trend toward evidence-based reform by adding a lot of evidence quickly to support (or not) programs of immediate importance to educators, to government, and to taxpayers.

Because of the central role government plays in education, and because government routinely collects a lot of data on student achievement, we could be learning a lot more from government initiatives and innovative programs. For just a little more investment, we could learn a lot about how to make the billions we spend on providing educational services a lot more effective. Very important peanuts, if you ask me.