How Tutoring Works (Cooking With The Grandkids)

My wife, Nancy, and I have three grandkids: Adaya (4 ½), Leo (3 ½), and Ava (8 months). They all live in Baltimore, so we see quite a lot of them, which is wonderful.

As with most grandparents and grandkids, one of our favorite activities with Adaya and Leo is cooking. We have two folding stepladders in the kitchen, which the kids work from. They help make pancakes, scrambled eggs, spaghetti, and other family classics. We start off giving the kids easy and safe tasks, like measuring and pouring ingredients into bowls and mixing, and as they become proficient, we let them pour ingredients into hot pans, scramble eggs on the stove, and so on. They love every bit of this, and are so proud of their accomplishments.

So here is my question. What are we making when we cook with the grandkids? If you say pancakes and eggs, that’s not wrong, but perhaps these are the least important things we are doing.

blog_1-30-20_kidcooking_500x333

What we are really doing is building the thrill of mastery in a loving and supportive context. All children are born into a confusing world. They want to understand their world and to learn to operate effectively in it. They want to do what the big people do. They also want to be loved and valued.

Now consider children who need tutoring because they are behind in reading. These kids are in very big trouble, and they know it. All of them understand what the purpose of school is. It is to learn to read. Yet they know they are not succeeding.

The solution, I believe, is a lot like cooking with people who love you. In other words, it’s tutoring, in small groups or one-to-one.

The effectiveness of tutoring is very well established in rigorous research, as I’ve noted more than once in this series of blogs. No surprise there. But what is surprising is that well-trained, caring tutors without teaching certificates using well-structured materials get outcomes just as good as those obtained by certified teachers. How can this be? If tutoring works primarily because it enables teachers to adapt instruction to meet the learning needs of individual students, then you’d expect that students who receive tutoring from certified, experienced teachers would get much better outcomes than those tutored by teaching assistants. But they don’t, on average. Further, a U.K. study of one-to-one tutoring over the internet found an effect size of zero. These and other unexpected findings support a conclusion that while the ability to individualize instruction is important in tutoring, it is not enough. The additional factor that explains much of the powerful impacts of tutoring, I believe, is love. Most tutors, with or without teaching certificates, love the children they tutor in a way that a teacher with 25 or 30 students usually cannot. A tutor with one or just a few children at a time is certain to get to know those children, and to care about them deeply. From the perspective of struggling children, their tutor is not just a teacher. She or he is a lifeline, a new chance to achieve the mastery they crave. Someone who knows and cares about then and will stick with them until they can read.

This is why individual or small-group tutoring is a bit like cooking with your grandparents. In both settings, children receive the two things they need and value the most: love and mastery.

My point here is not sentimental or idealistic. It is deadly practical. We already know a lot about how to use tutoring effectively and cost-effectively. Yet there is a great deal more we need to learn to maximize the benefits and minimize the costs of effective tutoring. We need to find out how to extend positive effects to larger numbers of students, to learn how to maintain and build on initial successes in the early grades, how to successfully tutor upper-elementary and secondary students, and how to reach students who still do not succeed despite small-group tutoring. We need to experiment with adaptations of tutoring for English learners.

We know that tutoring is powerful, but we need to make it more cost-effective without reducing its impact, so that many more children can experience the thrill of mastery. To do that, we have a lot of work to do. Let’s get cooking!

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Cost-Effectiveness of Small Solutions

Imagine that you were shopping for a reliable new car, one that is proven to last an average of at least 100,000 miles with routine maintenance and repairs. You are looking at a number of options that fit your needs for around $24,000.

You happen to be talking to your neighbor, an economist, about your plans. “$24,000?” she says. “That’s crazy. You can get a motorcycle that would go at least 100,000 miles for only $12,000, and save a lot on gas as well!”blog_8-22-19_vessuv_500x333

You point out to your neighbor that motorcycles might be nice for some purposes, but you need a car to go to the grocery store, transport the kids, and commute to work, even in rain or snow. “Sure,” says your neighbor, “but you posed a question of cost-effectiveness, and on that basis a motorcycle is the right choice. Or maybe a bicycle.”

In education, school leaders and policy makers are often faced with choices like this. They want to improve their students’ achievement, and they have limited resources. But the available solutions vary in cost, effectiveness, and many other factors.

To help leaders make good choices, economists have devised measures of cost-effectiveness, which means (when educational achievement is the goal) the amount of achievement gain you might expect from purchasing a given product or service divided by all costs of making that choice. Cost-effectiveness can be very useful in educational policy and practice in helping decision makers weigh the potential benefits of each of a set of choice available to them. The widespread availability of effect sizes indicating the outcomes and costs of various programs and practices, easily located in sources such as the What Works Clearinghouse and Evidence for ESSA, make it a lot easier to compare outcomes and costs of available programs. For example, a district might seek to improve high school math performance by adopting software and professional development for a proven technology program, or by adopting a proven professional development approach. All costs need to be considered as well as all benefits, and the school leaders might make the choice that produces the largest gains at the most affordable cost. Cost-effectiveness might not entirely determine which choice is made, but, one might argue, it should always be a key part of the decision-making process. Quantitative researchers in education and economics would agree. So far, so good.

But here is where things get a little dodgy. In recent years, there has arisen a lot of interest in super-cheap interventions that have super-small impacts, but the ratio between the benefits and the costs makes the super-cheap interventions look cost-effective. Such interventions are sometimes called “nudge strategies,” meaning that simple reminders or minimal actions activate a set of psychological process that can lead to important impacts. A very popular example right now is Carol Dweck’s Growth Mindset strategy, in which students are asked to write a brief essay stating a belief that intelligence is not a fixed attribute of people, but that learning comes from effort. Her work has found small impacts of this essentially cost-free treatment in several studies, although others have failed to find this effect.

Other examples include sending messages to students or parents on cell phones, or sending postcards to parents on the importance of regular attendance. These strategies can cost next to nothing, yet large-scale experiments often show positive effects in the range of +0.03 to +0.05, averaging across multiple studies.

Approaches of this kind, including Growth Mindset, are notoriously difficult to replicate by others. However, assume for the sake of argument that at least some of them do have reliably positive effects that are very small, but because of their extremely small cost, they appear very cost-effective. Should schools use them?

One might take a view that interventions like Growth Mindset are so inexpensive and so sensible that what the heck, go ahead. However, others take some time and effort on the part of staff.

Schools are charged with a very important responsibility, ensuring the academic success, psychological adjustment, and pro-social character of young people. Their financial resources are always limited, but even more limited is their schoolwide capacity to focus on a small number of essential goals and stick with those goals until they are achieved. The problem is that spending a lot of time on small solutions with small impacts may exhaust a school’s capacity to focus on what truly matters. If a school could achieve an effect size of +0.30 on important achievement measures with one comprehensive program, or (for half the price) could adopt ten small interventions with effect sizes averaging +0.03, which should it do? Any thoughtful educator would say, “Invest in the one program with the big effect.” The little programs are not likely to add up to a big effect, and any collection of unrelated, uncoordinated mini-reforms is likely to deplete the staff’s energy and enthusiasm over a period of time.

This is where the car-motorcycle analogy comes in. A motorcycle may appear more cost-effective than a car, but it just does not do what a car does. Motorcycles are fine for touring in nice weather, but for most people they do not solve essential problems. In school reform, large programs with large effects may be composed of smaller effective components, but because these components are an integrated part of a well-thought-out plan, they add up to something more likely to work and to keep working over time.

Cost-effectiveness is a useful concept for schools seeking to make big differences in achievement, using serious resources. For small interventions with small impacts, don’t bother to calculate cost-effectiveness, or if you do, don’t compare the results to those of big interventions with big impacts. To do so is like bragging about the gas mileage you get on your motorcycle driving Aunt Sally and the triplets to the grocery store. It just doesn’t make sense.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Achieving Breakthroughs in Education By Transforming Effective But Expensive Approaches to be Affordable at Scale

It’s summer in Baltimore. The temperatures are beastly, the humidity worse. I grew up in Washington, DC, which has the same weather. We had no air conditioning, so summers could be torture. No one could sleep, so we all walked around like zombies, yearning for fall.

Today, however, summers in Baltimore are completely bearable. The reason, of course, is air conditioning. Air conditioning existed when I was a kid, but hardly anyone could afford it.  I think the technology has gradually improved, but there was no scientific or technical breakthrough, as far as I know.  Yet somehow, all but the poorest families can afford air conditioning, so summer in Baltimore can be survived. Families that cannot afford air conditioning need assistance, especially for health reasons, but this number is small.

blog_8-15-19_airconditioning_500x357

The story of air conditioning resembles that of much other technology. What happens is that a solution is devised for a very important problem.  The solution is too expensive for ordinary people to use, so initially, it is used in circumstances that justify the cost.  For example, early automobiles were far too expensive for the general public, but they were used for important applications in which the benefits were particularly obvious, such as delivery trucks and cars for doctors and veterinarians.  Also, wealthy individuals and race car drivers could afford the early autos.  These applications provided experience with the manufacture, use, and repair of automobiles and encouraged investments in infrastructure, paving the way (so to speak) for mass production of cars (such as the Model T) that could be afforded by a much larger portion of the population and economy.  Modest improvements are constantly being made, but the focus is on making the technology less expensive, so it can be more widely used.  In medicine, penicillin was invented in the 1920s, but not until the advent of World War II was it made inexpensive enough for practical use.  It saved millions of lives not because it had been invented, but because the Merck Company was commissioned to find a way to make it practicable (the solution involved growing penicillin on rotting squash).

Innovations in education can work in a similar way.  One obvious example is instructional technology, which existed before the 1970s but is only now becoming universally available, mostly because it is falling in price.  However, what education has rarely done is to create expensive but hugely effective interventions and then figure out how to do them cheaply, without reducing their impact.

Until now.

If you are a regular reader of my blog, you can guess where I am going: Tutoring.  As everyone knows, one-to one tutoring by certified teachers is extremely effective.  No surprise there. As you regulars will also know, rigorous research over the past 20 years has established that tutoring by well-trained, well-supervised teaching assistants using proven methods routinely produces outcomes just as good as tutoring by certified teachers, at half the cost.  Further, one-to-small group tutoring, up to one to four, can be almost as effective as one-to-one tutoring in reading, and equally effective in mathematics (see www.bestevidence.org).

One-to-four tutoring by teaching assistants requires about one-eighth of the cost of one-to-one tutoring by teachers.  The mean outcomes for both types of tutoring are about an effect size of +0.30, but several programs are able to produce effect sizes in excess of +0.50, the national mean difference on NAEP between disadvantaged and middle-class students.  (As a point of comparison, average effects of technology applications with elementary struggling readers average +0.05 in reading, and in math, they average +0.07 for all elementary students.  Urban charter schools average +0.04 in reading, +0.05 in math).

Reducing the cost of tutoring should not be seen as a way for schools to save money.  Instead, it should be seen as a way to provide the benefits of tutoring to much larger numbers of students.  Because of its cost, tutoring has been largely restricted to the primary grades (especially first), to perhaps a semester of service, and to reading, but not math.  If tutoring is much less expensive but equally effective, then tutoring can be extended to older students and to math.  Students who need more than a semester of tutoring, or need “booster shots” to maintain their gains into later grades, should be able to receive the tutoring they need, for as long as they need it.

Tutoring has been how rich and powerful people educated their children since the beginning of time.  Ancient Romans, Greeks, and Egyptians had their children tutored if they could afford it.  The great Russian educational theorist, Lev Vygotsky, never saw the inside of a classroom as a child, because his parents could afford to have him tutored.  As a slave, Frederick Douglass received one-to-one tutoring (secretly and illegally) from his owner’s wife, right here in Baltimore.  When his master found out and forbade his wife to continue, Douglass sought further tutoring from immigrant boys on the docks where he worked, in exchange for his master’s wife’s fresh-cooked bread.  Helen Keller received tutoring from Anne Sullivan.  Tutoring has long been known to be effective.  The only question is, or should be, how do we maximize tutoring’s effectiveness while minimizing its cost, so that all students who need it can receive it?

If air conditioning had been like education, we might have celebrated its invention, but sadly concluded that it would never be affordable by ordinary people.  If penicillin had been like education, it would have remained a scientific curiosity until today, and millions would have died due to the lack of it.  If cars had been like education, only the rich would have them.

Air conditioning for all?  What a cool idea.  Cost-effective tutoring for all who need it?  Wouldn’t that be smart?

Photo credit: U.S. Navy photo by Pat Halton [Public domain]

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Tutoring Works. But Let’s Learn How It Can Work Better and Cheaper

I was once at a meeting of the British Education Research Association, where I had been invited to participate in a debate about evidence-based reform. We were having what journalists often call “a frank exchange of views” in a room packed to the rafters.

At one point in the proceedings, a woman stood up and, in a furious tone of voice, informed all and sundry that (I’m paraphrasing here) “we don’t need to talk about all this (very bad word). Every child should just get Reading Recovery.” She then stomped out.

I don’t know how widely her view was supported in the room or anywhere else in Britain or elsewhere, but what struck me at the time, and what strikes even more today, is the degree to which Reading Recovery has long defined, and in many ways limited, discussions about tutoring. Personally, I have nothing against Reading Recovery, and I have always admired the commitment Reading Recovery advocates have had to professional development and to research. I’ve also long known that the evidence for Reading Recovery is very impressive, but you’d be amazed if one-to-one tutoring by well-trained teachers did not produce positive outcomes. On the other hand, Reading Recovery insists on one-to-one instruction by certified teachers with a lot of cost for all that admirable professional development, so it is very expensive. A British study estimated the cost per child at $5400 (in 2018 dollars). There are roughly one million Year 1 students in the U.K., so if the angry woman had her way, they’d have to come up with the equivalent of $5.4 billion a year. In the U.S., it would be more like $27 billion a year. I’m not one to shy away from very expensive proposals if they provide also extremely effective services and there are no equally effective alternatives. But shouldn’t we be exploring alternatives?

If you’ve been following my blogs on tutoring, you’ll be aware that, at least at the level of research, the Reading Recovery monopoly on tutoring has been broken in many ways. Reading Recovery has always insisted on certified teachers, but many studies have now shown that well-trained teaching assistants can do just as well, in mathematics as well as reading. Reading Recovery has insisted that tutoring should just be for first graders, but numerous studies have now shown positive outcomes of tutoring through seventh grade, in both reading and mathematics. Reading Recovery has argued that its cost was justified by the long-lasting impacts of first-grade tutoring, but their own research has not documented long-lasting outcomes. Reading Recovery is always one-to-one, of course, but now there are numerous one-to-small group programs, including a one-to-three adaptation of Reading Recovery itself, that produce very good effects. Reading Recovery has always just been for reading, but there are now more than a dozen studies showing positive effects of tutoring in math, too.

blog_12-20-18_tutornkid_500x333

All of this newer evidence opens up new possibilities for tutoring that were unthinkable when Reading Recovery ruled the tutoring roost alone. If tutoring can be effective using teaching assistants and small groups, then it is becoming a practicable solution to a much broader range of learning problems. It also opens up a need for further research and development specific to the affordances and problems of tutoring. For example, tutoring can be done a lot less expensively than $5,400 per child, but it is still expensive. We created and evaluated a one-to-six, computer-assisted tutoring model that produced effect sizes of around +0.40 for $500 per child. Yet I just got a study from the Education Endowment Fund (EEF) in England evaluating one-to-three math tutoring by college students and recent graduates. They only provided tutoring one hour per week for 12 weeks, to sixth graders. The effect size was much smaller (ES=+0.19), but the cost was only about $150 per child.

I am not advocating this particular solution, but isn’t it interesting? The EEF also evaluated another means of making tutoring inexpensive, using online tutors from India and Sri Lanka, and another, using cross-age peer tutors, both in math. Both failed miserably, but isn’t that interesting?

I can imagine a broad range of approaches to tutoring, designed to enhance outcomes, minimize costs, or both. Out of that research might come a diversity of approaches that might be used for different purposes. For example, students in deep trouble, headed for special education, surely need something different from what is needed by students with less serious problems. But what exactly is it that is needed in each situation?

In educational research, reliable positive effects of any intervention are rare enough that we’re usually happy to celebrate anything that works. We might say, “Great, tutoring works! But we knew that.”  However, if tutoring is to become a key part of every school’s strategies to prevent or remediate learning problems, then knowing that “tutoring works” is not enough. What kind of tutoring works for what purposes?  Can we use technology to make tutors more effective? How effective could tutoring be if it is given all year or for multiple years? Alternatively, how effective could we make small amounts of tutoring? What is the optimal group size for small group tutoring?

We’ll never satisfy the angry woman who stormed out of my long-ago symposium at BERA. But for those who can have an open mind about the possibilities, building on the most reliable intervention we have for struggling learners and creating and evaluating effective and cost-effective tutoring approaches seems like a worthwhile endeavor.

Photo Courtesy of Allison Shelley/The Verbatim Agency for American Education: Images of Teachers and Students in Action.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Make No Small Plans

In recent years, an interest has developed in very low-cost interventions that produce small but statistically significant effects on achievement. The argument for their importance is that their costs are so low that their impacts are obtained very cost-effectively. For example, there is evidence that a brief self-affirmation exercise can produce a small but significant effect on achievement, and that a brief intervention to reduce “social identity threat” can do the same. A study in England found that a system to send 50 text messages over the course of a school year, announcing upcoming tests and homework assignments, feedback on grades, test results, and attendance, and updates on topics being studied in school, improved math achievement slightly but significantly, at a cost of about $5 a year.

There is nothing wrong with these mini-interventions, and perhaps all schools should use them. Why not? Yet I find myself a bit disturbed by this type of approach.

Step back from the small-cost/small-but-significant outcome and consider the larger picture, the task in which all who read this blog are jointly engaged. We face an educational system that is deeply dysfunctional. Disadvantaged students remain far, far behind middle-class students in educational outcomes, and the gap has not narrowed very much over decades. The U.S. remains well behind peer nations in achievement and is not catching up. Dropout rates in the U.S. are diminishing, but skill levels of American high school graduates from disadvantaged schools are appalling.

For schools with limited budgets to spend on reform, it may be all they can do to adopt a low-cost/low-but-significant outcome intervention on the basis that it’s better than nothing. But again, step back to look at the larger situation. The average American student is educated at a cost of more than $11,000 per year. There are whole-school reform approaches, such as our own Success for All in elementary and middle schools and BARR in secondary schools, that cost around $100 per student per year, and have been found to make substantial differences in student achievement. Contrast this to a low-cost program that costs, say, $5 per student per year.

$100 is less than 1% of the ordinary cost of educating a student, on average. $5 is less than .05%, of course. But in the larger scheme of things, who cares? Using a proven whole-school reform model might perhaps increase the per-student cost from $11,000 to $11,100. Adding the $5 low-cost intervention could increase per-student costs from $11,000 to $11,005. From the perspective of a principal who has a fixed budget, and simply does not have $100 per student to spend, the whole-school approach may be infeasible. But from the system perspective, the difference between $11,000 and $11,100 (or $11,005) is meaningless if it truly increases student achievement. Our goal must be to make meaningful progress in reducing gaps and increasing national achievement, not make a small difference that happens to be very inexpensive.

I once saw a film in England on the vital role of carrier pigeons in the English army in World War II. I’m sure those pigeons played their part in the victory, and they were very cost-effective. But ultimately, it was expensive tanks and planes and ships and other weapons, and courageous men and women, who won the war, not pigeons, and piling up small (even if effective) interventions was just not going to do it.

We should be in a war against inequality, disadvantage, and mediocre outcomes in education. Winning it will require identification and deployment of whole-school, whole-district, and whole-state approaches that can be reliably replicated and intelligently applied to ensure positive, widespread improvements. If we just throw pigeon-sized solutions at huge and tenacious problems, our difficulties are sure to come home to roost.

This blog is sponsored by the Laura and John Arnold Foundation

How Much Difference Does an Education Program Make?

When you use Consumer Reports car repair ratings to choose a reliable car, you are doing something a lot like what evidence-based reform in education is proposing. You look at the evidence and take it into account, but it does not drive you to a particular choice. There are other factors you’d also consider. For example, Consumer Reports might point you to reliable cars you can’t afford, or ones that are too large or too small or too ugly for your purposes and tastes, or ones with dealerships that are too far away. In the same way, there are many factors that school staffs or educational leaders might consider beyond effect size.

An effect size, or statistical significance, is only a starting point for estimating the impact a program or set of programs might have. I’d propose the term “potential impact” to subsume the following factors that a principal or staff might consider beyond effect size or statistical significance in adopting a program to improve education outcomes:

  • Cost-effectiveness
  • Evidence from similar schools
  • Immediate and long-term payoffs
  • Sustainability
  • Breadth of impact
  • Low-hanging fruit
  • Comprehensiveness

Cost-EffectivenessEconomists’ favorite criterion of effectiveness is cost-effectiveness. Cost-effectiveness is simple in concept (how much gain did the program cause at what cost?), but in fact there are two big elements of cost-effectiveness that are very difficult to determine:

1. Cost
2. Effectiveness

Cost should be easy, right? A school buys some service or technology and pays something for it. Well, it’s almost never so clear. When a school uses a given innovation, there are usually costs beyond the purchase price. For example, imagine that a school purchases digital devices for all students, loaded with all the software they will need. Easy, right? Wrong. Should you count in the cost of the time the teachers spend in professional development? The cost of tech support? Insurance? Security costs? The additional electricity required? Space for storage? Additional loaner units to replace lost or broken units? The opportunity costs for whatever else the school might have chosen to do?

Here is an even more difficult example. Imagine a school starts a tutoring program for struggling readers using paraprofessionals as tutors. Easy, right? Wrong. There is the cost for the paraprofessionals’ time, of course, but what if the paraprofessionals were already on the schools’ staff? If so, then a tutoring program may be very inexpensive, but if additional people must be hired as tutors, then tutoring is a far more expensive proposition. Also, if paraprofessionals already in the school are no longer doing what they used to do, might this diminish student outcomes? Then there is the problem with outcomes. As I explained in a recent blog, the meaning of effect sizes depends on the nature of the studies that produced them, so comparing apples to apples may be difficult. A principal might look at effect sizes for two programs and decide they look very similar. Yet one effect size might be from large-scale randomized experiments, which tend to produce smaller (and more meaningful) effect sizes, while the other might be from less rigorous studies.

Nevertheless, issues of cost and effectiveness do need to be considered. Somehow.

Evidence from Similar Schools
Clearly, a school staff would want to know that a given program has been successful in schools like theirs. For example, schools serving many English learners, or schools in rural areas, or schools in inner-city locations, might be particularly interested in data from similar schools. At a minimum, they should want to know that the developers have worked in schools like theirs, even if the evidence only exists from less similar schools.

Immediate and Long-Term Payoffs
Another factor in program impacts is the likelihood that a program will solve a very serious problem that may ultimately have a big effect on individual students and perhaps save a lot of money over time. For example, it may be that a very expensive parent training program may make a big difference for students with serious behavior problems. If this program produces lasting effects (documented in the research), its high cost might be justified, especially if it might reduce the need for even more expensive interventions, such as special education placement, expulsion, or incarceration.

Sustainability
Programs that either produce lasting impacts, or those that can be readily maintained over time, are clearly preferable to those that have short-term impacts only. In education, long-term impacts are not typically measured, but sustainability can be determined by the cost, effort, and other elements required to maintain an intervention. Most programs get a lot cheaper after the first year, so sustainability can usually be assumed. This means that even programs with modest effect sizes could bring about major changes over time.

Breadth of Impact
Some educational interventions with modest effect sizes might be justified because they apply across entire schools and for many years. For example, effective coaching for principals might have a small effect overall, but if that effect is seen across thousands of students over a period of years, it might be more than worthwhile. Similarly, training teachers in methods that become part of their permanent repertoire, such as cooperative learning, teaching metacognitive skills, or classroom management, might affect hundreds of students per teacher over time.

Low-Hanging Fruit
Some interventions may have either modest impacts on students in general, or strong outcomes for only a subset of students, but be so inexpensive or easy to adopt and implement that it would be foolish not to do so. One example might be making sure that disadvantaged students who need eyeglasses are assessed and given glasses. Not everyone needs glasses, but for those who do this makes a big difference at low cost. Another example might be implementing a whole-school behavior management approach like Positive Behavior Interventions and Support (PBIS), a low-cost, proven approach any school can implement.

Comprehensiveness
Schools have to solve many quite different problems, and they usually do this by pulling various solutions off of various shelves. The problem is that this approach can be uncoordinated and inefficient. The different elements may not link up well with each other, may compete for the time and attention of the staff, and may cost a lot more than a unified, comprehensive solution that addresses many objectives in a planful way. A comprehensive approach is likely to have a coherent plan for professional development, materials, software, and assessment across all program elements. It is likely to have a plan for sustaining its effects over time and extending into additional parts of the school or additional schools.

Potential Impact
Potential impact is the sum of all the factors that make a given program or a coordinated set of programs effective in the short and long term, broad in its impact, focused on preventing serious problems, and cost-effective. There is no numerical standard for potential impact, but the concept is just intended to give educators making important choices for their kids a set of things to consider, beyond effect size and statistical significance alone.

Sorry. I wish this were simple. But kids are complex, organizations are complex, and systems are complex. It’s always a good idea for education leaders to start with the evidence but then think through how programs can be used as tools to transform their particular schools.

OMB to Government: Show Us the Evidence

The words “OMB” and “exciting” rarely go in the same sentence, much less “OMB” and “OMG!” Yet on May 18, Jeffrey Zients, Acting Director of the Office of Management and Budget (OMB), sent out a memo that could change history. In guidance to executive departments and agencies, the memo asks the entire Executive Branch to use every available means to promote the use of rigorous evidence in decision-making, program administration, and planning. Some of the specific strategies urged by OMB were as follows:

• Low-cost evaluations, using routinely collected data. For example, when grants are made to schools to use particular programs, districts could be asked to submit schools in pairs, knowing that one in each pair will be assigned at random to use the program and one to wait. Then routinely collected test scores could be used in the evaluations, to compare experimental and control groups. Such studies could be done for peanuts, greatly expanding the evidence base for all sorts of programs.

• Evaluations linked to waivers. Existing rules often inhibit experimentation with practices or policies that might be used in the future. Agencies can waive those rules specifically for the purpose of testing innovations.

• Expanding evaluation efforts within existing programs. Imagine, for example, encouraging systematic variations in uses of Title I funding to determine better ways to help Title I children succeed.

• Systemic measurement of costs and cost per outcome. If there are more cost-effective ways to achieve better outcomes, we should be finding them, and then allocating resources accordingly.

• Infusing evidence into grant-makingAgencies can increase the use of evidence-based practices in all sorts of grants. In competitive grants, applicants could be offered a few competitive preference points if they propose to implement programs with strong evidence of effectiveness. Investing in Innovation (i3), of course, provides different levels of grants depending on the existing evidence base for promising innovations.

There is much more in this far-reaching memo, but these are the elements most relevant to education.

I have no idea how the memo will play out in practice, but at a minimum it provides clear and detailed guidance to all federal agencies: show us the evidence. More importantly, show the American people your evidence. It says that government is not about who gets what, it is about conscious and informed stewardship of public funds to produce valued outcomes.

For the latest on evidence-based education, follow me on twitter: @RobertSlavin

Put International Lessons to the Test in U.S. Schools

In a November 10 Sputnik I wrote some cautionary thoughts about what we can and cannot learn from international comparisons to improve educational policies. My old friend Marc Tucker, in his December 20 blog called Top Performers, took me to task, saying that by suggesting we try out ideas from abroad in our own schools before adopting them wholesale, I was “looking for my keys where the light was better” rather than where they might actually be.

In my blog I was completely agreeing with Marc that we can learn a lot from other countries. I work part-time in England and am very familiar with education there and elsewhere in Europe. There is indeed much we can learn in other countries. In fact, we already are: the hot off the press Quality Counts report from Education Week found that “Education officials in 29 states reported that their agency uses international education comparisons to inform their reform strategies or identify ‘best practices.'” Where I take issue with Marc is in his apparent belief that if we study what successful nations do, we can just plunk their policies down in our context and all will be well. Marc seems to think that international comparisons have proven that our main efforts need to be directed toward improving teacher quality. He might very well be right. I’d love to see teacher salaries doubled, teacher education dramatically improved, induction enhanced, and so on, and perhaps these policies would solve our problems by making teaching a more attractive profession, bringing higher-quality students into teaching, and providing excellent professional development and support to help existing and new teachers to be effective and to want to stay in the profession. Frankly, however, there isn’t a U.S. educator or policy maker who didn’t already know that these would be great ideas long before we ever heard of Finland.

But how do we cause all of these things to happen in our society, with our kids? Which of these policies are not only effective, but most cost-effective? Is it too much to ask that whatever ideas we glean from observing Finland or Singapore or Japan be tested in Minnesota or Massachusetts or Mississippi, so we can learn how they work here? And in the meantime, might we also increase use of programs and practices that have been proven to work in the U.S., and develop and evaluate more of them?

America’s strength in every field, from medicine to agriculture to satellites, lies in its extraordinary capacity in research and development. This is true in education as much as in other areas; the products of U.S. educational R & D are much sought after in other countries. While other countries can give us good ideas and benchmarks to evaluate our students’ performance, let’s also build on our strengths.

Gold-Standard Program Evaluations, on a Shoestring Budget

Note: This is a guest post by Jon Baron, President of the Coalition for Evidence-Based Policy, and Chairman of the National Board for Education Sciences

In today’s tough economic climate, quality evaluations of education reforms – to determine which are truly effective in improving student achievement, graduation rates, and other key outcomes – are especially important. They enable us to focus our limited resources on strategies that have been proven to work.

Well-conducted randomized controlled trials are generally recognized as the most reliable method (the “gold standard”) for evaluating a program’s effectiveness. However, widespread misconceptions about what such studies involve – including their cost – have often limited their use by education officials.

In plain language: Randomized controlled trials in education are studies that randomly assign a sample of students, teachers, or schools to a group that participates in the program (“the program group”) or to a group that does not (“the control group”). With a sufficiently large sample, this process helps ensure that the two groups are equivalent, so that any difference in their outcomes over time – such as student achievement – can be attributed to the program, and not to other factors.

Such studies are often perceived as being too costly and administratively burdensome to be practical in most educational settings. In fact, however, it is often possible to conduct such a study at low cost and burden if the study can measure outcomes using state test scores or other administrative data that are already collected for other purposes. Costs are reduced by eliminating what is typically the study’s most labor-intensive and costly component: locating the individual sample members at various points in time after program completion, and administering tests or interviews to obtain their outcome data. In some cases, the only remaining cost is the researcher’s time to analyze the data.

For example, the following are two recent randomized trials that were conducted at low cost, yet produced findings of policy and practical importance:

Roland Fryer, recent winner of the MacArthur “Genius” Award, conducted an evaluation of New York City’s $75 million Teacher Incentive Program in which 396 of the city’s lowest-performing public schools were randomly assigned to an incentive group, which could receive an annual bonus of up to $3000 per teacher if the school increased student achievement and other key outcomes, or a control group. Three years after random assignment, the study found that the incentives had no effect on student achievement, attendance, graduation rates, behavior, GPA, or other outcomes. Based in part on these results, the city recently ended the program, freeing up resources for other efforts to improve student outcomes.

The study’s cost: Approximately $50,000. The low cost was possible because the study measured all outcomes using state test scores and other administrative records already collected for other purposes.
Eric Bettinger and Rachel Baker conducted an evaluation of InsideTrack college coaching – a widely-implemented mentoring program for college students designed to prevent them from dropping out of school. This was a well-conducted trial, which randomized more than 13,000 students at eight colleges. The study found that the program produced a 14 percent increase in college persistence for at least two years, and a 13 percent increase in likelihood of graduating college.

The study’s cost: Less than $20,000. The low cost was possible because the study measured its key outcomes using administrative data that the colleges already collected for other purposes – i.e., their enrollment and graduation records – rather than by collecting new data through individual surveys.
In recent years, federal and state policy, as well as improvements in information technology, have greatly increased the availability of high-quality administrative data on student achievement and other key educational outcomes. Thus, it has become more feasible than ever before to conduct gold-standard randomized evaluations on a shoestring budget. Equipped with reliable evidence, education officials can have much greater confidence that their spending decisions will produce important improvements in student outcomes.

-Jon Baron

The Coalition for Evidence-Based Policy is a nonprofit, nonpartisan organization whose mission is to increase government effectiveness through the use of rigorous evidence about “what works.”

Education Innovation: What It Is and Why We Need More of It

NOTE: This is a guest post from Jim Shelton, Assistant Deputy Secretary of the Office of Innovation and Improvement at the U.S. Department of Education.

Whether for reasons of economic growth, competitiveness, social justice or return on tax-payer investment, there is little rational argument over the need for significant improvement in U.S. educational outcomes. Further, it is irrefutable that the country has made limited improvement on most educational outcomes over the last several decades, especially when considered in the context of the increased investment over the same period. In fact, the total cost of producing each successful high school and college graduate has increased substantially over time instead of decreasing – creating what some argue is an inverted learning curve.

This analysis stands in stark contrast to the many anecdotes of teachers, schools and occasionally whole systems “beating the odds” by producing educational outcomes well beyond “reasonable” expectations. And, therein lies the challenge and the rationale for a very specific definition of educational innovation.

Education not only needs new ideas and inventions that shatter the performance expectations of today’s status quo; to make a meaningful impact, these new solutions must also “scale”, that is grow large enough, to serve millions of students and teachers or large portions of specific under-served populations. True educational innovations are those products, processes, strategies and approaches that improve significantly upon the status quo and reach scale.

Shelton graphic.JPG

Systems and programs at the local, state and national level, in their quest to improve, should be in the business of identifying and scaling what works. Yet, we traditionally have lacked the discipline, infrastructure, and incentives to systematically identify breakthroughs, vet them and support their broad adoption – a process referred to as field scans. Programs like the Department of Education’s Investing in Innovation Fund (i3) are designed as field scans; but i3 is tiny in comparison to both the need and the opportunity. To achieve our objectives, larger funding streams will need to drive the identification, evaluation, and adoption of effective educational innovations.

Field scans are only one of three connected pathways to education innovation, and they build on the most recognized pathway – basic and applied research. The time to produce usable tools and resources from this pathway can be long – just as in medicine where development and approval of new drugs and devices can take 12-15 years – but, with more and better leveraged resources, more focus, and more discipline, this pathway can accelerate our understanding of teaching and learning and production of performance enhancing practices and tools.

The third pathway focuses specifically on accelerating transformational breakthroughs, which require a different approach – directed development. Directed development processes identify cutting edge research and technology (technology generically, not specifically referring to software or hardware) and use a uniquely focused approach to accelerate the pace at which specific game changing innovations reach learners and teachers. Directed development within the federal government is most associated with DARPA (the Defense Advanced Research Projects Agency), which used this unique and aggressive model of R&D to produce technologies that underlie the Internet, GPS, and the unmanned aircraft (drone). Education presents numerous opportunities for such work. For example: (1) providing teachers with tools that identify each student’s needs and interests and match them to the optimal instructional resources or (2) cost-effectively achieving the 2 standard deviations of improvement that one-to-one human tutors generate. In 2010, the President’s Council of Advisors on Science and Technology recommended the creation of an ARPA for Education to pursue directed development in these and other areas of critical need and opportunity.

Each of these pathways -the field scan, basic and applied research and directed development – will be essential to improving and ultimately transforming learning from cradle through career. If done well, we will redefine “the possible” and reclaim American educational leadership while addressing inequity at home and abroad. At that point, we may be able to rely on a simpler definition of innovation:

“An innovation is one of those things that society looks at and says, if we make this part of the way we live and work, it will change the way we live and work.”

-Dean Kamen

-Jim Shelton

Note: The Office of Innovation and Improvement at the U.S. Department of Education administers more than 25 discretionary grant programs, including the Investing in Innovation Program, Charter Schools Program, and Technology in Education.