Eliminating Achievement Gaps in England

2014-01-30-HPEnglandImage013014.jpg

Imagine that the U.S. government released a report showing that the achievement gap between White students and African American and Hispanic children had been eliminated. (In fact, the gap has hardly budged since 1980.) Such a report would cause dancing in the streets, champagne all around, and a genuine sense of a job well done.

Just such a report was released, but in England, not the U.S. In that country, longstanding achievement gaps have been greatly diminished or eliminated, according to Ofsted, an independent government agency that manages school inspectors. As recently as 2007, White English students far outscored those who were Black African, Afro-Carribean, Pakistani, or Bangladeshi. Today, all of these gaps have disappeared, or are now very small, in reading and math. The result is that instead of worrying about the achievement gap, the English are worrying about the perceived low performance of White students. However, in actuality, White students also made gains over the same period, just not as much as minorities.

How did the English accomplish this feat of closing the achievement gap, and what can we learn from their efforts in the U.S.? Such a large shift may have many causes, but the Ofsted report gives the main credit to a massive investment made by the previous Labor government called the London Challenge. This initiative provided substantial professional development and assistance to high-poverty schools throughout the London metropolitan area. A Manchester Challenge did the same in that city. Since most minority students in England live in London or Manchester, this appears to have been enough to affect reading and math gaps in the entire country.

England is very similar to the U.S. in most ways. It has similar income per capita, and its overall scores on international tests such as PISA and TIMSS are similar to ours. However, some aspects of the English context may not transfer to the U.S. For example, the proportion of students in England who are minorities is just 7%, compared to 30% in the U.S. A key structural difference is that although there are districts (local authorities) in England, they are very weak. Principals (head teachers) and boards of governors for each school have great autonomy, but the national government plays a much stronger role than in the U.S. What this means is that if the national government decides to intervene to improve schools, they can do so, and this is what happened in the London and Manchester Challenges. Also, basic per-pupil funding in the UK is equalized, as it is in all civilized countries, and in fact schools with many poor or minority students get extra funding. This contrasts with the situation in the U.S., where funding is largely dependent on local or state tax receipts, so poor areas receive much less than rich ones. A London or Manchester Challenge therefore builds on a level playing field, while in the U.S., just getting to equality would be a major accomplishment.

The most important lesson from the English example is this: achievement gaps are not inexorable. Substantial investment and concerted effort can make a meaningful difference. The English example does not provide a simple road map for the U.S., but it should remove a lot of excuses for the persistence of gaps in academic achievement in our country.

Hearts, Wallets, and Evidence

2014-01-23-HeartsWalletsEvidence1232014.jpg

Everyone knows that effective education is one of the best means of preventing all sorts of social ills. Yet when educational effectiveness reduces costs of special education or dropouts or police or prisons or unemployment, where do those savings go? Because most of the “pay me now” side of the equation is in school budgets and prevents “pay me later” in other budgets, “pay me now, pay me later” rhetoric is usually just that — rhetoric.

There’s something starting that may change this dynamic. It’s called social impact bonds. The idea is that investors purchase bonds to enable local governments to invest in services that are likely to save them a lot of money down the road. The services are rigorously evaluated, and if they are found to be effective, the investors get a return. David Bornstein recently covered this topic in a New York Times article. He says, “What’s most noteworthy about this approach is that, if it works, it creates incentives to finance prevention — the smartest and usually the cheapest way to address problems, but also the hardest thing to get governments to pay for. (Program costs are incurred immediately, but savings often accrue on someone else’s watch.)”

I have no idea whether social impact bonds make sense purely on a financial basis, but they make all kinds of sense for public-spirited investors who are willing to help their communities but want to be sure their money will be used effectively. They make all kinds of sense as a way to help government agencies fund needed services, and they make even more sense in building up the evidence base for replicable programs. Most of all, they make sense for kids, who benefit from increasingly effective programs.

So far, social impact bonds are showing up in areas such as delinquency prevention, where very big savings are possible within a few years. Recently, Goldman Sachs invested approximately $10 million in a New York City program for incarcerated youth, to reduce recidivism. If recidivism drops by more than 10%, Goldman Sachs could make up to $2.1 million in profit, far less than what New York City would save. In the UK, where the idea began, social impact bonds are being used in a broader range of preventive children’s services, including programs to reduce foster care, homelessness, and need for health care.

I’m sure that involving the private sector in this way will rub many people the wrong way, but if social impact bonds can bring the discipline of investors to social services and accelerate evidence-based reform, I’m all for it. Good hearts, good wallets, and good evidence have to add up to good outcomes for vulnerable children.

Lessons From Innovators: Calibrating Expectations for i3 Evaluation Results

2014-01-16-HpImage011614.jpg

The process of moving an educational innovation from a good idea to widespread effective implementation is far from straightforward, and no one has a magic formula for doing it. The William T. Grant and Spencer Foundations, with help from the Forum for Youth Investment, have created a community composed of grantees in the federal Investing in Innovation (i3) program to share ideas and best practices. Our Success for All program participates in this community. In this space, I, in partnership with the Forum for Youth Investment, highlight observations from the experiences of i3 grantees other than our own, in an attempt to share the thinking and experience of colleagues out on the front lines of evidence-based reform. This blog post is from Dr. Donald J. Peurach, Assistant Professor of Educational Studies in the University of Michigan’s School of Education. Since 2012, Dr. Peurach has served as an advisor and contributor to the i3 Learning Community. As a researcher who focuses on large-scale educational reform, Dr. Peurach provides his perspective from the front lines.

As a participant-observer in the i3 Learning Community, I have had a front row seat on ambitious efforts by the U.S. Department of Education’s Office of Innovation and Improvement (OII) to revolutionize educational innovation and reform. Others will soon have a glimpse, too, and the fate of the revolution may well rest on how they interpret what they see.

With its Investing in Innovation (i3) program, OII is investing nearly a billion dollars in the development, validation, and scale up of over one hundred diverse reform initiatives, all subject to rigorous, independent evaluations. In coordination with the U.S. Department of Education’s Institute of Education Sciences (IES), results will be reported in the What Works Clearinghouse so that decision makers have high-quality information on which to base school improvement efforts.

For most people, their best glimpse of the i3-funded initiatives will come via these evaluation results. Preliminary reports from two scale-up recipients are largely positive:Reading Recovery and Success for All. This is not surprising. Both are well-established enterprises that have been refined through more than two decades of use in thousands of schools.

Additional evaluation results are soon to follow, from a broad array of initiatives not nearly as well established. History predicts that many of these results will be characterized by variability in implementation and outcomes that cloud efforts to determine what works (and what doesn’t). But this, too, would not be surprising. Both researchers and reformers (including contributors to this blog) have long reported that efforts to establish and evaluate ambitious improvement initiatives have been challenged by interactions among the complex problems to be solved in schools, the uncertain research base on which to draw, and the turbulent environments of U.S. public education.

If historical precedents hold, the effect could be to leave OII’s efforts politically vulnerable, as promises of revolution and equivocal results are not a good mix. For example, barely five years after finding support in federal policy, the comprehensive school reform movement met a quick and quiet death, as lofty promises of “break-the-mold” school improvement collided with equivocal evaluation results to contribute to a rapid erosion of political support. This was the case despite a small number of positive outliers having met high standards for evidence of effectiveness (including Success for All).

Yet new developments provide reasons for hope. Within the i3 Learning Community, reformers are collaborating to develop and manage their enterprises as learning systems that improve and persist in the face of complexity, uncertainty, and turbulence. Doing so includes critically analyzing implementation and outcomes in order to understand, explain, and respond to both successes and struggles. Similar work is underway in the Hewlett Foundation’s “Deeper Learning” initiative.

Moreover, rather than passing summary judgment based on quick glimpses, researchers and policymakers are increasingly recognizing the struggles of reformers as legitimate, and they are interpreting equivocality in evaluation results as a reason to push still-deeper into the challenging work of educational innovation and reform. For example, some researchers are working hard to systematically study variation in program effects to determine what works, where, for whom, and why. With new support from IES, other researchers working inside and outside of the academy are advancing improvement-focused evaluation strategies that have great potential to reduce that variation.

Such efforts mark a great advance beyond a narrow focus on determining what works (and what doesn’t). To be clear: Making that determination is, at some point, absolutely essential. After all, the life chances of many, many students hang in the balance. The advance lies in acknowledging that the road to positive results is far rockier than most realize, and that paving it smooth requires supporting reformers in learning to manage the complexity, uncertainty, and turbulence that have long been their undoing.

Indeed, from my front row seat, the revolution in educational innovation and reform looks to be just beginning, with increasing potential to coordinate new, improvement-focused evaluation strategies with more sophisticated impact evaluation strategies in both supporting and assessing educational innovation. Whether that is, in fact, the case will depend in no small part on what others make of the glimpses provided by forthcoming i3 evaluation results: what they make of outlying successes and failures, certainly; but, more importantly, what they make of (and decide to do about) the great, grey space in the middle.

Success in Evidence-Based Reform: The Importance of Failure

As always, Winston Churchill said it best: “Success consists of going from failure to failure without loss of enthusiasm.” There is a similar Japanese saying: “Success is being knocked down seven times and getting up eight.”

These quotes came to my mind while I was reading a recently released report from the Aspen Institute, “Leveraging Learning: The Evolving Role of Federal Policy in Education Research.” The report is a useful scan of the education research horizon, intended as background for the upcoming reauthorization of the Education Sciences Reform Act (ESRA), the legislation that authorizes the Institute of Education Sciences (IES). However, the report also contains brief chapters by various policy observers (including myself), focusing on how research might better inform and improve practice and outcomes in education. A common point of departure in some of these was that while randomized experiments (RCTs) emphasized for the past decade by IES and, more recently, Investing in Innovation (i3), are all well and good, the IES experience is that most randomized experiments evaluating educational programs find few achievement effects. Several cited testimony by Jon Baron that “of the 90 interventions evaluated in randomized trials by IES, 90% were found to have weak or no positive effects.” As a response, the chapter authors proposed various ways in which IES could add to its portfolio more research that is not RCTs.

Within the next year or two, the problem Baron was reporting will take on a great deal of importance. The results of the first cohort of Investing in Innovation grants will start being released. At the same time, additional IES reports will appear, and the Education Endowment Foundation (EEF) in the U.K., much like i3, will also begin to report outcomes. All four of the first cohort of scale-up programs funded by i3 (our Success for All programReading RecoveryTeach for America, and KIPP) have had positive first-year findings in i3 or similar evaluations recently, but this is not surprising, as they had to pass a high evidence bar to get scale-up funding in the first place. The much larger number of validation and development projects were not required to have such strong research bases, and many of these are sure to show no effects on achievement. Kevan Collins, Director of the EEF, has always openly said that he’d be delighted if 10% of the studies EEF has funded find positive impacts. Perhaps in the country of Churchill, Collins is better placed to warn his countrymen that success in evidence-based reform is going to require some blood, sweat, toil, and tears.

In the U.S., I’m not sure if policymakers or educators are ready for what is about to happen. If most i3 validation and development projects fail to produce significant positive effects in rigorous, well-conducted evaluations, will opinion leaders celebrate the programs that do show good outcomes and value the knowledge gained from the whole process, including knowledge about what almost worked and what to avoid doing next time? Will they support additional funding for projects that take these learnings into account? Or will they declare the i3 program a failure and move on to the next set of untried policies and practices?

I very much hope that i3 or successor programs will stay the course, insisting on randomized experiments and building on what has been learned. Even if only 10% of validation and development projects report clear, positive achievement outcomes and capacity to go to scale, there will be many reasons to celebrate and stay on track:

1. There are currently 112 i3 validation and development projects (plus 5 scale-ups). If just 10% of these were found to be effective and scalable, that would be 11 new programs. Adding this to the scale-up programs and other programs already positively reviewed in the What Works Clearinghouse, this would be a substantial base of proven programs. In medicine, the great majority of treatments initially evaluated are found not to be effective, yet the medical system of innovation works because the few proven approaches make such a big difference. Failure is fine if it leads to success.

2. Among the programs that do not produce statistically significant positive outcomes on achievement measures, there are sure to be many that show promise but do not quite reach significance. For example, any program whose evaluation shows a student-level positive effect size of, say, +0.15 or more should be worthy of additional investment to refine and improve its procedures and its evaluation to reach a higher standard, rather than being considered a bust.

3. The i3 process is producing a great deal of information about what works and what does not, what gets implemented and what does not, and the match between schools’ needs and programs’ approaches. These learnings should contribute to improvements in new programs, to revisions of existing programs, and to the policies applied by i3, IES, and other funders.

4. As the findings of the i3 and IES evaluations become known, program developers, grant reviewers, and government leaders should get smarter about what kinds of approaches are likely to work and to go to scale. Because of this, one might imagine that even if only 10% of validation and development programs succeed in RCTs today, higher and higher proportions will succeed in such studies in the future.

Evidence-based reform, in which promising scalable approaches are ultimately evaluated in RCTs or similarly rigorous evaluations, is the best way to create substantial and lasting improvements in student achievement. Failures of individual evaluations or projects are an expected, even valued part of the process of research-based reform. We need to be prepared for them, and to celebrate the successes and the learnings along the way.

As Churchill also said, “Success is not final, failure is not fatal; It is the courage to continue that counts.”