Make No Small Plans

In recent years, an interest has developed in very low-cost interventions that produce small but statistically significant effects on achievement. The argument for their importance is that their costs are so low that their impacts are obtained very cost-effectively. For example, there is evidence that a brief self-affirmation exercise can produce a small but significant effect on achievement, and that a brief intervention to reduce “social identity threat” can do the same. A study in England found that a system to send 50 text messages over the course of a school year, announcing upcoming tests and homework assignments, feedback on grades, test results, and attendance, and updates on topics being studied in school, improved math achievement slightly but significantly, at a cost of about $5 a year.

There is nothing wrong with these mini-interventions, and perhaps all schools should use them. Why not? Yet I find myself a bit disturbed by this type of approach.

Step back from the small-cost/small-but-significant outcome and consider the larger picture, the task in which all who read this blog are jointly engaged. We face an educational system that is deeply dysfunctional. Disadvantaged students remain far, far behind middle-class students in educational outcomes, and the gap has not narrowed very much over decades. The U.S. remains well behind peer nations in achievement and is not catching up. Dropout rates in the U.S. are diminishing, but skill levels of American high school graduates from disadvantaged schools are appalling.

For schools with limited budgets to spend on reform, it may be all they can do to adopt a low-cost/low-but-significant outcome intervention on the basis that it’s better than nothing. But again, step back to look at the larger situation. The average American student is educated at a cost of more than $11,000 per year. There are whole-school reform approaches, such as our own Success for All in elementary and middle schools and BARR in secondary schools, that cost around $100 per student per year, and have been found to make substantial differences in student achievement. Contrast this to a low-cost program that costs, say, $5 per student per year.

$100 is less than 1% of the ordinary cost of educating a student, on average. $5 is less than .05%, of course. But in the larger scheme of things, who cares? Using a proven whole-school reform model might perhaps increase the per-student cost from $11,000 to $11,100. Adding the $5 low-cost intervention could increase per-student costs from $11,000 to $11,005. From the perspective of a principal who has a fixed budget, and simply does not have $100 per student to spend, the whole-school approach may be infeasible. But from the system perspective, the difference between $11,000 and $11,100 (or $11,005) is meaningless if it truly increases student achievement. Our goal must be to make meaningful progress in reducing gaps and increasing national achievement, not make a small difference that happens to be very inexpensive.

I once saw a film in England on the vital role of carrier pigeons in the English army in World War II. I’m sure those pigeons played their part in the victory, and they were very cost-effective. But ultimately, it was expensive tanks and planes and ships and other weapons, and courageous men and women, who won the war, not pigeons, and piling up small (even if effective) interventions was just not going to do it.

We should be in a war against inequality, disadvantage, and mediocre outcomes in education. Winning it will require identification and deployment of whole-school, whole-district, and whole-state approaches that can be reliably replicated and intelligently applied to ensure positive, widespread improvements. If we just throw pigeon-sized solutions at huge and tenacious problems, our difficulties are sure to come home to roost.

This blog is sponsored by the Laura and John Arnold Foundation

Advertisements

How Much Difference Does an Education Program Make?

When you use Consumer Reports car repair ratings to choose a reliable car, you are doing something a lot like what evidence-based reform in education is proposing. You look at the evidence and take it into account, but it does not drive you to a particular choice. There are other factors you’d also consider. For example, Consumer Reports might point you to reliable cars you can’t afford, or ones that are too large or too small or too ugly for your purposes and tastes, or ones with dealerships that are too far away. In the same way, there are many factors that school staffs or educational leaders might consider beyond effect size.

An effect size, or statistical significance, is only a starting point for estimating the impact a program or set of programs might have. I’d propose the term “potential impact” to subsume the following factors that a principal or staff might consider beyond effect size or statistical significance in adopting a program to improve education outcomes:

  • Cost-effectiveness
  • Evidence from similar schools
  • Immediate and long-term payoffs
  • Sustainability
  • Breadth of impact
  • Low-hanging fruit
  • Comprehensiveness

Cost-EffectivenessEconomists’ favorite criterion of effectiveness is cost-effectiveness. Cost-effectiveness is simple in concept (how much gain did the program cause at what cost?), but in fact there are two big elements of cost-effectiveness that are very difficult to determine:

1. Cost
2. Effectiveness

Cost should be easy, right? A school buys some service or technology and pays something for it. Well, it’s almost never so clear. When a school uses a given innovation, there are usually costs beyond the purchase price. For example, imagine that a school purchases digital devices for all students, loaded with all the software they will need. Easy, right? Wrong. Should you count in the cost of the time the teachers spend in professional development? The cost of tech support? Insurance? Security costs? The additional electricity required? Space for storage? Additional loaner units to replace lost or broken units? The opportunity costs for whatever else the school might have chosen to do?

Here is an even more difficult example. Imagine a school starts a tutoring program for struggling readers using paraprofessionals as tutors. Easy, right? Wrong. There is the cost for the paraprofessionals’ time, of course, but what if the paraprofessionals were already on the schools’ staff? If so, then a tutoring program may be very inexpensive, but if additional people must be hired as tutors, then tutoring is a far more expensive proposition. Also, if paraprofessionals already in the school are no longer doing what they used to do, might this diminish student outcomes? Then there is the problem with outcomes. As I explained in a recent blog, the meaning of effect sizes depends on the nature of the studies that produced them, so comparing apples to apples may be difficult. A principal might look at effect sizes for two programs and decide they look very similar. Yet one effect size might be from large-scale randomized experiments, which tend to produce smaller (and more meaningful) effect sizes, while the other might be from less rigorous studies.

Nevertheless, issues of cost and effectiveness do need to be considered. Somehow.

Evidence from Similar Schools
Clearly, a school staff would want to know that a given program has been successful in schools like theirs. For example, schools serving many English learners, or schools in rural areas, or schools in inner-city locations, might be particularly interested in data from similar schools. At a minimum, they should want to know that the developers have worked in schools like theirs, even if the evidence only exists from less similar schools.

Immediate and Long-Term Payoffs
Another factor in program impacts is the likelihood that a program will solve a very serious problem that may ultimately have a big effect on individual students and perhaps save a lot of money over time. For example, it may be that a very expensive parent training program may make a big difference for students with serious behavior problems. If this program produces lasting effects (documented in the research), its high cost might be justified, especially if it might reduce the need for even more expensive interventions, such as special education placement, expulsion, or incarceration.

Sustainability
Programs that either produce lasting impacts, or those that can be readily maintained over time, are clearly preferable to those that have short-term impacts only. In education, long-term impacts are not typically measured, but sustainability can be determined by the cost, effort, and other elements required to maintain an intervention. Most programs get a lot cheaper after the first year, so sustainability can usually be assumed. This means that even programs with modest effect sizes could bring about major changes over time.

Breadth of Impact
Some educational interventions with modest effect sizes might be justified because they apply across entire schools and for many years. For example, effective coaching for principals might have a small effect overall, but if that effect is seen across thousands of students over a period of years, it might be more than worthwhile. Similarly, training teachers in methods that become part of their permanent repertoire, such as cooperative learning, teaching metacognitive skills, or classroom management, might affect hundreds of students per teacher over time.

Low-Hanging Fruit
Some interventions may have either modest impacts on students in general, or strong outcomes for only a subset of students, but be so inexpensive or easy to adopt and implement that it would be foolish not to do so. One example might be making sure that disadvantaged students who need eyeglasses are assessed and given glasses. Not everyone needs glasses, but for those who do this makes a big difference at low cost. Another example might be implementing a whole-school behavior management approach like Positive Behavior Interventions and Support (PBIS), a low-cost, proven approach any school can implement.

Comprehensiveness
Schools have to solve many quite different problems, and they usually do this by pulling various solutions off of various shelves. The problem is that this approach can be uncoordinated and inefficient. The different elements may not link up well with each other, may compete for the time and attention of the staff, and may cost a lot more than a unified, comprehensive solution that addresses many objectives in a planful way. A comprehensive approach is likely to have a coherent plan for professional development, materials, software, and assessment across all program elements. It is likely to have a plan for sustaining its effects over time and extending into additional parts of the school or additional schools.

Potential Impact
Potential impact is the sum of all the factors that make a given program or a coordinated set of programs effective in the short and long term, broad in its impact, focused on preventing serious problems, and cost-effective. There is no numerical standard for potential impact, but the concept is just intended to give educators making important choices for their kids a set of things to consider, beyond effect size and statistical significance alone.

Sorry. I wish this were simple. But kids are complex, organizations are complex, and systems are complex. It’s always a good idea for education leaders to start with the evidence but then think through how programs can be used as tools to transform their particular schools.

OMB to Government: Show Us the Evidence

The words “OMB” and “exciting” rarely go in the same sentence, much less “OMB” and “OMG!” Yet on May 18, Jeffrey Zients, Acting Director of the Office of Management and Budget (OMB), sent out a memo that could change history. In guidance to executive departments and agencies, the memo asks the entire Executive Branch to use every available means to promote the use of rigorous evidence in decision-making, program administration, and planning. Some of the specific strategies urged by OMB were as follows:

• Low-cost evaluations, using routinely collected data. For example, when grants are made to schools to use particular programs, districts could be asked to submit schools in pairs, knowing that one in each pair will be assigned at random to use the program and one to wait. Then routinely collected test scores could be used in the evaluations, to compare experimental and control groups. Such studies could be done for peanuts, greatly expanding the evidence base for all sorts of programs.

• Evaluations linked to waivers. Existing rules often inhibit experimentation with practices or policies that might be used in the future. Agencies can waive those rules specifically for the purpose of testing innovations.

• Expanding evaluation efforts within existing programs. Imagine, for example, encouraging systematic variations in uses of Title I funding to determine better ways to help Title I children succeed.

• Systemic measurement of costs and cost per outcome. If there are more cost-effective ways to achieve better outcomes, we should be finding them, and then allocating resources accordingly.

• Infusing evidence into grant-makingAgencies can increase the use of evidence-based practices in all sorts of grants. In competitive grants, applicants could be offered a few competitive preference points if they propose to implement programs with strong evidence of effectiveness. Investing in Innovation (i3), of course, provides different levels of grants depending on the existing evidence base for promising innovations.

There is much more in this far-reaching memo, but these are the elements most relevant to education.

I have no idea how the memo will play out in practice, but at a minimum it provides clear and detailed guidance to all federal agencies: show us the evidence. More importantly, show the American people your evidence. It says that government is not about who gets what, it is about conscious and informed stewardship of public funds to produce valued outcomes.

For the latest on evidence-based education, follow me on twitter: @RobertSlavin

School Turnaround the Wright Way

Thumbnail image for Thumbnail image for Thumbnail image for Wright Brothers 1 27 12.jpgIn 1903, Wilbur and Orville Wright changed the world in the most American of ways, by tinkering in their bicycle shop and then testing their flying machine in the dunes of Kitty Hawk. The basic design principles they followed were the same as those being followed by optimistic airplane designers all over the world. Others used similar airframes, engines, and controls. The Wright brothers did make numerous innovations, but to an observer, there was little that differentiated their model from many others, with one exception: their airplane actually flew.

Now space forward 109 years, and consider school reform. In turning around persistently low-achieving schools, report after report tells us that we need to emphasize strong leadership, high expectations, extensive professional development, effective use of time, and data-based management. All of these are emphasized in School Improvement Grants (SIG), for example, and all are certainly sensible. But is emphasizing such a list of “design principles” enough to turn around failing schools?

In a recent post I wrote about the importance of developing and disseminating well-structured, well-integrated programs that have been rigorously evaluated and found to be effective.Disseminating proven programs is very different from disseminating lists of variables associated with effective schools. For one thing, proven programs are known to work across a variety of circumstances, and are not limited to a particular set of circumstances unlikely to exist elsewhere. The Wright biplane would have just been another curiosity if it had not turned out to work anywhere with an airfield. Second, proven programs depend on many more, and more specific, innovations than those captured by the lists. Third, proven programs are provided by organizations that build expertise in supporting their effective use, and are essentially held accountable for the success of their approach. If the Wright brothers had not been able to improve upon and scale up their model, their inventiveness would not have mattered.

Anyone who has tried to turn around a failing school armed with a list of variables and general good advice will know that the chances of takeoff are uncertain. No program guarantees success, but replicating and adapting proven programs offers the best chance of making a difference. It’s better to do it the Wright way.

Image: John T. Daniels, 1903, available via public domain

Put International Lessons to the Test in U.S. Schools

In a November 10 Sputnik I wrote some cautionary thoughts about what we can and cannot learn from international comparisons to improve educational policies. My old friend Marc Tucker, in his December 20 blog called Top Performers, took me to task, saying that by suggesting we try out ideas from abroad in our own schools before adopting them wholesale, I was “looking for my keys where the light was better” rather than where they might actually be.

In my blog I was completely agreeing with Marc that we can learn a lot from other countries. I work part-time in England and am very familiar with education there and elsewhere in Europe. There is indeed much we can learn in other countries. In fact, we already are: the hot off the press Quality Counts report from Education Week found that “Education officials in 29 states reported that their agency uses international education comparisons to inform their reform strategies or identify ‘best practices.'” Where I take issue with Marc is in his apparent belief that if we study what successful nations do, we can just plunk their policies down in our context and all will be well. Marc seems to think that international comparisons have proven that our main efforts need to be directed toward improving teacher quality. He might very well be right. I’d love to see teacher salaries doubled, teacher education dramatically improved, induction enhanced, and so on, and perhaps these policies would solve our problems by making teaching a more attractive profession, bringing higher-quality students into teaching, and providing excellent professional development and support to help existing and new teachers to be effective and to want to stay in the profession. Frankly, however, there isn’t a U.S. educator or policy maker who didn’t already know that these would be great ideas long before we ever heard of Finland.

But how do we cause all of these things to happen in our society, with our kids? Which of these policies are not only effective, but most cost-effective? Is it too much to ask that whatever ideas we glean from observing Finland or Singapore or Japan be tested in Minnesota or Massachusetts or Mississippi, so we can learn how they work here? And in the meantime, might we also increase use of programs and practices that have been proven to work in the U.S., and develop and evaluate more of them?

America’s strength in every field, from medicine to agriculture to satellites, lies in its extraordinary capacity in research and development. This is true in education as much as in other areas; the products of U.S. educational R & D are much sought after in other countries. While other countries can give us good ideas and benchmarks to evaluate our students’ performance, let’s also build on our strengths.

Breaking Down Red-Tape Barriers to College

Updated

As college application season is coming to a close, parents and kids are embarking on a more daunting task: figuring out how to pay for college. Unfortunately, difficulties in navigating the financial aid process can result in many students forgoing college altogether. Could there be a better way to help kids get beyond this single but life altering barrier?

Stanford researcher Eric Bettinger did a study recently in which H&R Block took data from peoples’ tax forms to fill out a FAFSA (Free Application for Federal Student Aid) for their high school seniors. The cost of doing this was trivial, yet the benefits were huge. Children of parents randomly assigned to have their FAFSA done by H&R Block were significantly more likely to go to college than kids randomly assigned to a control group. To my knowledge, there is no more effective way of increasing the college attendance of kids who might or might not go, and this one costs almost nothing.

It so happens that my son was going into a Master’s program in Florida and had to fill out a FAFSA. Knowing about the H&R Block study, I suggested he take it to H&R Block office near him that had just done his tax forms. Needless to say, they didn’t provide the service. I’ve since learned that even though H&R Block paid for Bettinger’s study (with the help of grants from the Bill & Melinda Gates Foundation, the National Science Foundation, and other sources), their offices rarely offer the FAFSA service.

I tell this story because I think it speaks volumes about inequities and idiocies in American education. First, it exposes one of the many enormous benefits kids get if they just have the good sense to be born to middle-class, literate parents (who can help them fill out a FAFSA). Second, why is it that school districts or colleges themselves cannot provide the service H&R Block was experimenting with (but later decided not to offer)? Third, if it does take H&R Block or other tax preparers to do a FAFSA, then why can’t every low income parent of a high school kid hoping to go to college get a voucher to have their local tax preparer help them fill out a FAFSA form?

This is not my field, so perhaps all of these things are being done, BUT I STRONGLY DOUBT IT. Instead, my rather confident guess is that the system is happily cranking along, effectively barring deserving, capable, and promising young people from a brighter future because it’s no one’s job to solve this little FAFSA problem. We spend billions, by the way, in financial aid and elaborate programs to help able, disadvantaged kids go to college. It’s not that we’re unwilling to spend money. It’s just that we’re unwilling to follow the evidence until we find solutions to the core problems of our society.

Gold-Standard Program Evaluations, on a Shoestring Budget

Note: This is a guest post by Jon Baron, President of the Coalition for Evidence-Based Policy, and Chairman of the National Board for Education Sciences

In today’s tough economic climate, quality evaluations of education reforms – to determine which are truly effective in improving student achievement, graduation rates, and other key outcomes – are especially important. They enable us to focus our limited resources on strategies that have been proven to work.

Well-conducted randomized controlled trials are generally recognized as the most reliable method (the “gold standard”) for evaluating a program’s effectiveness. However, widespread misconceptions about what such studies involve – including their cost – have often limited their use by education officials.

In plain language: Randomized controlled trials in education are studies that randomly assign a sample of students, teachers, or schools to a group that participates in the program (“the program group”) or to a group that does not (“the control group”). With a sufficiently large sample, this process helps ensure that the two groups are equivalent, so that any difference in their outcomes over time – such as student achievement – can be attributed to the program, and not to other factors.

Such studies are often perceived as being too costly and administratively burdensome to be practical in most educational settings. In fact, however, it is often possible to conduct such a study at low cost and burden if the study can measure outcomes using state test scores or other administrative data that are already collected for other purposes. Costs are reduced by eliminating what is typically the study’s most labor-intensive and costly component: locating the individual sample members at various points in time after program completion, and administering tests or interviews to obtain their outcome data. In some cases, the only remaining cost is the researcher’s time to analyze the data.

For example, the following are two recent randomized trials that were conducted at low cost, yet produced findings of policy and practical importance:

Roland Fryer, recent winner of the MacArthur “Genius” Award, conducted an evaluation of New York City’s $75 million Teacher Incentive Program in which 396 of the city’s lowest-performing public schools were randomly assigned to an incentive group, which could receive an annual bonus of up to $3000 per teacher if the school increased student achievement and other key outcomes, or a control group. Three years after random assignment, the study found that the incentives had no effect on student achievement, attendance, graduation rates, behavior, GPA, or other outcomes. Based in part on these results, the city recently ended the program, freeing up resources for other efforts to improve student outcomes.

The study’s cost: Approximately $50,000. The low cost was possible because the study measured all outcomes using state test scores and other administrative records already collected for other purposes.
Eric Bettinger and Rachel Baker conducted an evaluation of InsideTrack college coaching – a widely-implemented mentoring program for college students designed to prevent them from dropping out of school. This was a well-conducted trial, which randomized more than 13,000 students at eight colleges. The study found that the program produced a 14 percent increase in college persistence for at least two years, and a 13 percent increase in likelihood of graduating college.

The study’s cost: Less than $20,000. The low cost was possible because the study measured its key outcomes using administrative data that the colleges already collected for other purposes – i.e., their enrollment and graduation records – rather than by collecting new data through individual surveys.
In recent years, federal and state policy, as well as improvements in information technology, have greatly increased the availability of high-quality administrative data on student achievement and other key educational outcomes. Thus, it has become more feasible than ever before to conduct gold-standard randomized evaluations on a shoestring budget. Equipped with reliable evidence, education officials can have much greater confidence that their spending decisions will produce important improvements in student outcomes.

-Jon Baron

The Coalition for Evidence-Based Policy is a nonprofit, nonpartisan organization whose mission is to increase government effectiveness through the use of rigorous evidence about “what works.”