Educational Policies vs. Educational Programs: Evidence from France

Ask any parent what their kids say when they ask them what they did in school today. Invariably, they respond, “Nuffin,” or some equivalent. My four-year-old granddaughter always says, “I played with my fwends.” All well and good.

However, in educational policy, policy makers often give the very same answer when asked, “What did the schools not using the (insert latest policy darling) do?”

“Nuffin’”. Or they say, “Whatever they usually do.” There’s nothing wrong with the latter answer if it’s true. But given the many programs now known to improve student achievement (see www.evidenceforessa.org), why don’t evaluators compare outcomes of new policy initiatives to those of proven educational programs known to improve the same outcomes the policy innovation is supposed to improve, perhaps at far lower cost per student? The evaluations should also compare to “business as usual,” but adding proven programs to evaluations of large policy innovations would help avoid declaring policy innovations to be successful when they are in fact just slightly more effective than “business as usual,” and much less effective or less cost-effective than alternative proven approaches? For example, when evaluating charter schools, why not routinely compare them to whole-school reform models that have similar objectives? When evaluating extending the school day or school year to help high-poverty schools, why not compare these innovations to using the same amount of additional money to hiring tutors to use proven tutoring models to help struggling students? In evaluating policies in which students are held back if they do not read at grade level by third grade, why not compare these approaches to intensive phonics instruction and tutoring in grades K-3, which are known to greatly improve student reading achievement?

blog_7-25-19_LeoandAdaya_375x500
There is nuffin like a good fwend.

As one example of research comparing a policy intervention to a promising educational intervention, I recently saw a very interesting pair of studies from France. Ecalle, Gomes, Auphan, Cros, & Magnan (2019) compared two interventions applied in special priority areas with high poverty levels. Both interventions focused on reading in first grade.

One of the interventions involved halving class size, from approximately 24 students to 12. The other provided intensive reading instruction in small groups (4-6 children) to students who were struggling in reading, as well as less intensive interventions to larger groups (10-12 students). Low achievers got two 30-minute interventions each day for a year, while the higher-performing readers got one 30-minute intervention each day. In both cases, the focus of instruction was on phonics. In all cases, the additional interventions were provided by the students’ usual teachers.

The students in small classes were compared to students in ordinary-sized classes, while the students in the educational intervention were compared to students in same-sized classes who did not get the group interventions. Similar measures and analyses were used in both comparisons.

The results were nearly identical for the class size policy and the educational intervention. Halving class size had effect sizes of +0.14 for word reading and +0.22 for spelling. Results for the educational intervention were +0.13 for word reading, +0.12 for spelling, +0.14 for a group test of reading comprehension, +0.32 for an individual test of comprehension, and +0.19 for fluency.

These studies are less than perfect in experimental design, but they are nevertheless interesting. Most importantly, the class size policy required an additional teacher for each class of 24. Using Maryland annual teacher salaries and benefits ($84,000), that means the cost in our state would be about $3500 per student. The educational intervention required one day of training and some materials. There was virtually no difference in outcomes, but the differences in cost were staggering.

The class size policy was mandated by the Ministry of Education. The educational intervention was offered to schools and provided by a university and a non-profit. As is so often the case, the policy intervention was simplistic, easy to describe in the newspaper, and minimally effective. The class size policy reminds me of a Florida program that extended the school schedule by an hour every day in high-poverty schools, mainly to provide more time for reading instruction. The cost per child was about $800 per year. The outcomes were minimal (ES=+0.05).

After many years of watching what schools do and reviewing research on outcomes of innovations, I find it depressing that policies mandated on a substantial scale are so often found to be ineffective. They are usually far more expensive than much more effective, rigorously evaluated programs that are, however, a bit more difficult to describe, and rarely arouse great debate in the political arena. It’s not that anyone is opposed to the educational intervention, but it is a lot easier to carry a placard saying “Reduce Class Size Now!” than to carry one saying “Provide Intensive Phonics in Small Groups with More Supplemental Teaching for the Lowest Achievers Now!” The latter just does not fit on a placard, and though easy to understand if explained, it does not lend itself to easy communication. Actually, there are much more effective first grade interventions than the one evaluated in France (see www.evidenceforessa.org). At a cost much less than $3500 per student, several one-to-one tutoring programs using well-trained teaching assistants as tutors would have been able to produce an effect size of more than +0.50 for all first graders on average. This would even fit on a placard: “Tutoring Now!”

I am all in favor of trying out policy innovations. But when parents of kids in a proven-program comparison group are asked what they did in school today, they shouldn’t say “nuffin’”. They should say, “My tooter taught me to read. And I played with my fwends.”

References

Ecalle, J., Gomes, C., Auphan, P., Cros, L., & Magnan, A. (2019). Effects of policy and educational interventions intended to reduce difficulties in literacy skills in grade 1. Studies in Educational Evaluation, 61, 12-20.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Advertisements

Is ES=+0.50 Achievable?: Schoolwide Approaches That Might Meet This Standard

In a recent blog, “Make No Small Plans,” I proposed a system innovators could use to create very effective schoolwide programs.  I defined these as programs capable of making a difference in student achievement large enough to bring entire schools serving disadvantaged students to the levels typical of middle class schools.  On average, that would mean creating school models that could routinely add an effect size of +0.50 for entire disadvantaged schools.  +0.50, or half a standard deviation, is roughly the average difference between students who qualify for free lunch and those who do not, between African American and White students, and between Hispanic and non-Hispanic White students.

Today, I wanted to give some examples of approaches intended to meet the +0.50 goal. From prior work, my colleagues and I already have created a successful schoolwide reform model, Success for All, which, with adequate numbers of tutors (as many as six per school) achieved reading effect sizes in high-poverty Baltimore elementary schools of over +0.50 for all students and +0.75 for the lowest-achieving quarter of students (Madden et al, 1993).   These outcomes maintained through eighth grade, and showed substantial reductions in grade retentions and special education placements (Borman & Hewes, 2003).  Steubenville, in Ohio’s Rust Belt, uses Success for All in all of its Title I elementary schools, providing several tutors in each.  Each year, Steubenville schools score among the highest in Ohio on state tests, exceeding most wealthy suburban schools.  Other SFA schools with sufficient tutors are also exemplary in achievement gains.  Yet these schools face a dilemma.  Most cannot afford significant numbers of tutors.  They still get excellent results, but less than those typical of SFA schools that do have sufficient tutors.

blog_12-20-18_tutornkid_500x333

We are now planning another approach, also intended to produce schoolwide effect sizes of at least +0.50 in schools serving disadvantaged students.   However, in this case our emphasis is on tutoring, the most effective strategy known for improving the achievement of struggling readers (Inns et al., 2019).  We are calling this approach the Reading Safety Net.  Main components of this plan are as follows:

Tutoring

Like the most successful forms of Success for All, the Reading Safety Net places a substantial emphasis on tutoring.  Tutors will be well-qualified teaching assistants with BAs but not teaching certificates, extensively trained to provide one-to-four tutoring.   Tutors will use a proven computer-assisted model in which students do a lot of pair teaching.  This is what we now call our Tutoring With the Lightning Squad model, which achieved outcomes of +0.40 and +0.46 in two studies in the Baltimore City Public Schools (Madden & Slavin, 2017).  A high-poverty school of 500 students might engage about five tutors, providing extensive tutoring to the majority of students, for as many years as necessary.  One additional tutor or teacher will supervise the tutors and personally work with students having the most serious problems.   We will provide significant training and follow-up coaching to ensure that all tutors are effective.

blog_11-8-18_tutoring_500x333

Attendance and Health

Many students fail in reading or other outcomes because they have attendance problems or certain common health problems. We propose to provide a health aide to help solve these problems.

Attendance

Many students, especially those in high-poverty schools, fail because they do not attend school regularly. Yet there are several proven approaches for increasing attendance, and reducing chronic truancy (Shi, Inns, Lake, and Slavin, 2019).  Health aides will help teachers and other staff organize and manage effective attendance improvement approaches.

Vision Services

My colleagues and I have designed strategies to help ensure that all students who need eyeglasses receive them. A key problem in this work is ensuring that students who receive glasses use them, keep them safe, and replace them if they are lost or broken. Health aides will coordinate use of proven strategies to increase regular use of needed eyeglasses.

blog_4-19-18_tutoring_500x329

Asthma and other health problems

Many students in high-poverty schools suffer from chronic illnesses.  Cures or prevention are known for these, but the cures may not work if medications are not taken daily.   For example, asthma is common in high-poverty schools, where it is the top cause of hospital referrals and a leading cause of death for school-age children.  Inexpensive inhalers can substantially improve children’s health, yet many children do not regularly take their medicine. Studies suggest that having trained staff ensure that students take their medicine, and watch them doing so, can make a meaningful difference.  The same may be true of other chronic, easily treated diseases common among children but often not consistently treated in inner-city schools.  Health aides with special supplemental training may be able to play a key on-the-ground role in helping ensure effective treatment for asthma and other diseases.

Potential Impact

The Reading Safety Net is only a concept at present.  We are seeking funding to support its further development and evaluation.  As we work with front line educators, colleagues, and others to further develop this model, we are sure to find ways to make the approach more effective and cost-effective, and perhaps extend it to solve other key problems.

We cannot yet claim that the Reading Safety Net has been proven effective, although many of its components have been.  But we intend to do a series of pilots and component evaluations to progressively increase the impact, until that impact attains or surpasses the goal of ES=+0.50.  We hope that many other research teams will mobilize and obtain resources to find their own ways to +0.50.  A wide variety of approaches, each of which would be proven to meet this ambitious goal, would provide a range of effective choices for educational leaders and policy makers.  Each would be a powerful, replicable tool, capable of solving the core problems of education.

We know that with sufficient investment and encouragement from funders, this goal is attainable.  If it is in fact attainable, how could we accept anything less?

References

Borman, G., & Hewes, G. (2003).  Long-term effects and cost effectiveness of Success for All.  Educational Evaluation and Policy Analysis, 24 (2), 243-266.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2019). A synthesis of quantitative research on programs for struggling readers in elementary schools. Manuscript submitted for publication.

Madden, N. A., & Slavin, R. E. (2017). Evaluations of Technology-Assisted Small-Group Tutoring for Struggling Readers. Reading & Writing Quarterly, 1-8.

Madden, N. A., Slavin, R. E., Karweit, N. L., Dolan, L., & Wasik, B. (1993). Success for All:  Longitudinal effects of a schoolwide elementary restructuring program. American Educational Reseach Journal, 30, 123-148.

Shi, C., Inns, A., Lake, C., & Slavin, R. E. (2019). Effective school-based programs for K-12 students’ attendance: A best-evidence synthesis. Baltimore, MD: Center for Research and Reform in Education, Johns Hopkins University.

 

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Make No Small Plans

“Make no little plans; they have no magic to stir men’s blood, and probably themselves will not be realized. Make big plans, aim high in hope and work, remembering that a noble, logical diagram, once recorded, will never die…”

-Daniel Burnham, American architect, 1910

More than 100 years ago, architect Daniel Burnham expressed an important insight. “Make no little plans,” he said. Many people have said that, one way or another. But Burnham’s insight was that big plans matter because they “have magic to stir men’s blood.” Small plans do not, and for this reason may never even be implemented. Burnham believed that even if big plans fail, they have influence into the future, as little plans do not.

blog_6-27-19_Great Wall of China
Make no small plans.

In education, we sometimes have big plans. Examples include comprehensive school reform in the 1990s, charter schools in the 2000s, and evidence-based reform today. None of these have yet produced revolutionary positive outcomes, but all of them have captured the public imagination. Even if you are not an advocate of any of these, you cannot ignore them, as they take on a life of their own. When conditions are right, they will return many times, in many forms, and may eventually lead to substantial impacts. In medicine, it was demonstrated in the mid-1800s that germs caused disease and that medicine could advance through rigorous experimentation (think Lister and Pasteur, for example). Yet sterile procedures in operations and disciplined research on practical treatments took 100 years to prevail. The medical profession resisted sterile procedures and evidence-based medicine for many years. Sterile procedures and evidence-based medicine were big ideas. It took a long time for them to take hold, but they did prevail, and remained big ideas through all that time.

Big Plans in Education

In education, as in medicine long ago, we have thousands of important problems, and good work continues (and needs to continue) on most of them. However, at least in American education, there is one crucial problem that dwarfs all others and lends itself to truly big plans. This is the achievement gap between students from middle class backgrounds and those from disadvantaged backgrounds. As noted in my April 25 blog, the achievement gap between students who qualify for free lunch and those who do not, between African American and White students, and between Hispanic students and non-Hispanic White students, all average an effect size of about 0.50. This presents a serious challenge. However, as I pointed out in that blog, there are several programs in existence today capable of adding an effect size of +0.50 to the reading or math achievement of students at risk. All programs that can do this involve one-to-one or one-to-small group tutoring. Tutoring is expensive, but recent research has found that well-trained and well-supervised tutors with BAs, but not necessarily teaching certificates, can obtain the same outcomes as certified teachers do, at half the cost. Using our own Success for All program with six tutors per school (K-5), high-poverty African American elementary schools in Baltimore obtained effect sizes averaging +0.50 for all students and +0.75 for students in the lowest 25% of their grades (Madden et al., 1993). A follow-up to eighth grade found that achievement outcomes maintained and both retentions and special education placements were cut in half (Borman & Hewes, 2003). We have not had the opportunity to once again implement Success for All with so much tutoring included, but even with fewer tutors, Success for All has had substantial impacts. Cheung et al. (2019) found an average effect size of +0.27 across 28 randomized and matched studies, a more than respectable outcome for a whole-school intervention. For the lowest-achieving students, the average was +0.56.

Knowing that Success for All can achieve these outcomes is important in itself, but it is also an indication that substantial positive effects can be achieved for whole schools, and with sufficient tutors, can equal the entire achievement gaps according to socio-economic status and race. If one program can do this, why not many others?

Imagine that the federal government or other large funders decided to support the development and evaluation of several different ideas. Funders might establish a goal of increasing reading achievement by an effect size of +0.50, or as close as possible to this level, working with high-poverty schools. Funders would seek organizations that have already demonstrated success at an impressive level, but not yet +0.50, who could describe a compelling strategy to increase their impact to +0.50 or more. Depending on the programs’ accomplishments and needs, they might be funded to experiment with enhancements to their promising model. For example, they might add staff, add time (e.g., continue for multiple years), or add additional program components likely to strengthen the overall model. Once programs could demonstrate substantial outcomes in pilots, they might be funded to do a cluster randomized trial. If this experiment shows positive effects approaching +0.50 or more, the developers might receive funding for scale-up. If the outcomes are substantially positive but significantly less than +0.50, the funders might decide to help the developers make changes leading up to a second randomized experiment.

There are many details to be worked out, but the core idea could capture the imagination and energy of educators and public-spirited citizens alike. This time, we are not looking for marginal changes that can be implemented cheaply. This time, we will not quit until we have many proven, replicable programs, each of which is so powerful that it can, over a period of years, remedy the entire achievement gap. This time, we are not making changes in policy or governance and hoping for the best. This time, we are going directly to the schools where the disadvantaged kids are, and we are not declaring victory until we can guarantee such students gains that will give them the same outcomes as those of the middle class kids in the suburbs.

Perhaps the biggest idea of all is the idea that we need big ideas with big outcomes!

Anyway, this is my big plan. What’s yours?

————

Note: Just as I was starting on this blog, I got an email from Ulrich Boser at the Center for American Progress. CAP and the Thomas Fordham Foundation are jointly sponsoring an “Education Moonshot,” including a competition with a grand prize of $10,000 for a “moonshot idea that will revolutionize schooling and dramatically improve student outcomes.” For more on this, please visit the announcement site. Submissions are due August 1st at this online portal and involve telling them in 500 words your, well, big plan.

 

References

Borman, G., & Hewes, G. (2003).  Long-term effects and cost effectiveness of Success for All.  Educational Evaluation and Policy Analysis, 24 (2), 243-266.

Cheung, A., Xie, C., Zhuang, T., & Slavin, R. E. (2019). Success for All: A quantitative synthesis of evaluations. Manuscript submitted for publication.

Madden, N.A., Slavin, R.E., Karweit, N.L., Dolan, L.J., & Wasik, B.A. (1993).  Success for All:  Longitudinal effects of a restructuring program for inner-city elementary schools.  American Educational Research Journal, 30, 123-148.

 

 

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Evidence For Revolution

In the 1973 movie classic “Sleeper,” Woody Allen plays a New York health food store owner who wakes up 200 years in the future, in a desolate environment.

“What happened to New York?” he asks the character played by Diane Keaton.  She replies, “It was destroyed.  Some guy named Al Shanker got hold of a nuclear weapon.”

I think every member of the American Federation of Teachers knows this line.  Firebrand educator Al Shanker, founder of the AFT, would never have hurt anyone.  But short of that, he would do whatever it took to fight for teachers’ rights, and most importantly, for the rights of students to receive a great education.  In fact, he saw that the only way for teachers to receive the respect, fair treatment, and adequate compensation they deserved, and still deserve, was to demonstrate that they had skills not possessed by the general public that could have powerful impacts on students’ learning.  Physicians are much respected and well paid because they have special knowledge of how to prevent and cure disease, and to do this they have available a vast armamentarium of drugs, devices, and procedures, all proven to work in rigorous research.

Shanker was a huge fan of evidence in education, first because evidence-based practice helps students succeed, but also because teachers using proven programs and practices show that they deserve respect and fair compensation because they have specialized knowledge backed by proven methods able to ensure the success of students.

The Revolutionary Potential of Evidence in Education

The reality is that in most school districts, especially large ones, most power resides in the central office, not in individual schools.  The district chooses textbooks, computer technology, benchmark assessments, and much more.  There are probably principals and teachers on the committees that make these decisions, but once the decisions are made, the building-level staff is supposed to fall in line and do as they are told.  When I speak to principals and teachers, they are astonished to learn that they can easily look up on www.evidenceforessa.org just about any program their district is using and find out what the evidence base for that program is.  Most of the time, the programs they have been required to use by their school administrations either have no valid evidence of effectiveness, or they have concrete evidence that they do not work.  Further, in almost all categories, effective programs or approaches do exist, and could have been selected as practical alternatives to the ones that were adopted.  Individual schools could have been allowed to choose proven programs, instead of being required to use programs they know not to be proven effective.

Perhaps schools should always be given the freedom to select and implement programs other than those mandated by the district, as long as the programs they want to implement have stronger evidence of effectiveness than the district’s programs.

blog_6-27-19_delacroix_500x391

How the Revolution Might Happen

Imagine that principals, teachers, parent activists, enlightened school board members, and others in a given district were all encouraged to use Evidence for ESSA or other reviews of evaluations of educational programs.  Imagine that many of these people just wrote letters to the editor, or letters to district leaders, letters to education reporters, or perhaps, if these are not sufficient, they might march on the district offices with placards reading something like “Use What Works” or “Our Children Deserve Proven Programs.”  Who could be against that?

One of three things might happen.  First, the district might allow individual schools to use proven programs in place of the standard programs, and encourage any school to come forward with evidence from a reliable source if its staff or leadership wants to use a proven program not already in use.  That would be a great outcome.  Second, the district leadership might start using proven programs districtwide, and working with school leaders and teachers to ensure successful implementation.  This retains the top-down structure, but it could greatly improve student outcomes.  Third, the district might ignore the protesters and the evidence, or relegate the issue to a very slow study committee, which may be the same thing.  That would be a distressing outcome, though no worse than what probably happens now in most places.  It could still be the start of a positive process, if principals, teachers, school board members, and parent activists keep up the pressure, helpfully informing the district leaders about proven programs they could select when they are considering a change.

If this process took place around the country, it could have a substantial positive impact beyond the individual districts involved, because it could scare the bejabbers out of publishers, who would immediately see that if they are going to succeed in the long run, they need to design programs that will likely work in rigorous evaluations, and then market them based on real evidence.  That would be revolutionary indeed.  Until the publishers get firmly on board, the evidence movement is just tapping at the foundations of a giant fortress with a few ball peen hammers.  But there will come a day when that fortress will fall, and all will be beautiful. It will not require a nuclear weapon, just a lot of committed and courageous educators and advocates, with a lot of persistence, a lot of information on what works in education, and a lot of ball peen hammers.

Picture Credit: Liberty Leading the People, Eugène Delacroix [Public domain]

 This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

On Progress

My grandfather (pictured below with my son Ben around 1985) was born in 1900, and grew up in Argentina. The world he lived in as a child had no cars, no airplanes, few cures for common diseases, and inefficient agriculture that bound the great majority of the world to farming. By the time he died, in 1996, think of all the astonishing progress he’d seen in technology, medicine, agriculture, and much else.

blog_5-2-19_ben_359x500
Pictured are Bob Slavin’s grandfather and son, both of whom became American citizens: one born before the invention of airplanes, the other born before the exploration of Mars.

I was born in 1950. The progress in technology, medicine, and agriculture, and many other fields, continues to be extraordinary.

In most of our society and economy, we confidently expect progress. When my father needed a heart valve, his doctor suggested that he wait as long as possible because new, much better heart valves were coming out soon. He could, and did, bet his life on progress, and it paid off.

But now consider education. My grandfather attended school in Argentina, where he was taught in rows by teachers who did most of the talking. My father went to school in New York City, where he was taught in rows by teachers who did most of the talking. I went to school in Washington, DC, where I was taught in rows by teachers who did most of the talking. My children went to school in Baltimore, where they mostly sat at tables, and did use some technology, but still, the teachers did most of the talking.

 

My grandchildren are now headed toward school (the oldest is four). They will use a lot of technology, and will sit at tables more than my own children did. But the basic structure of the classroom is not so different from Argentina, 1906. All who eagerly await the technology revolution are certainly seeing many devices in classroom use. But are these devices improving outcomes on, for example, reading and math? Our reviews of research on all types of approaches used in elementary and secondary schools are not finding strong benefits of technology. Across all subjects and grade levels, the average effect size is similar, ranging from +0.07 (elementary math) to +0.09 (elementary reading). If you like “additional months of learning,” these effects equate to one month in a year. Ok, better than zero, but not the revolution we’ve been waiting for.

There are other approaches much more effective than technology, such as tutoring, forms of cooperative learning, and classroom management strategies. At www.evidenceforessa.org, you can see descriptions and outcomes of more than 100 proven programs. But these are not widely used. Your children or grandchildren, or other children you care about, may go 13 years from kindergarten to 12th grade without ever experiencing a proven program. In our field, progress is slow, and dissemination of proven programs is slower.

Education is the linchpin for our economy and society. Everything else depends on it. In all of the developed world, education is richly funded, yet very, very little of this largesse is invested in innovation, evaluations of innovative methods, or dissemination of proven programs. Other fields have shown how innovation, evaluation, and dissemination of proven strategies can become the engine of progress. There is absolutely nothing inevitable about the slow pace of progress in education. That slow pace is a choice we have made, and keep making, year after year, generation after generation. I hope we will make a different choice in time to benefit my grandchildren, and the children of every family in the world. It could happen, and there are many improvements in educational research and development to celebrate. But how long must it take before the best of educational innovation becomes standard practice?

 This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

The Fabulous 20%: Programs Proven Effective in Rigorous Research

blog_4-18-19_girlcheer_500x333
Photo courtesy of Allison Shelley/The Verbatim Agency for American Education: Images of Teachers and Students in Action

Over the past 15 years, governments in the U.S. and U.K. have put quite a lot of money (by education standards) into rigorous research on promising programs in PK-12 instruction. Rigorous research usually means studies in which schools, teachers, or students are assigned at random to experimental or control conditions and then pre- and posttested on valid measures independent of the developers. In the U.S., the Institute for Education Sciences (IES) and Investing in Innovation (i3), now called Education Innovation Research (EIR), have led this strategy, and in the U.K., it’s the Education Endowment Foundation (EEF). Enough research has now been done to enable us to begin to see important patterns in the findings.

One finding that is causing some distress is that the numbers of studies showing significant positive effects is modest. Across all funding programs, the proportion of studies reporting positive, significant findings averages around 20%. It is important to note that most funded projects evaluate programs that have been newly developed and not previously evaluated. The “early phase” or “development” category of i3/EIR is a good example; it provides small grants intended to fund creation or refinement of new programs, so it is not so surprising that these studies are less likely to find positive outcomes. However, even programs that have been successfully evaluated in the past often do not replicate their positive findings in the large, rigorous evaluations required at the higher levels of i3/EIR and IES, and in all full-scale EEF studies. The problem is that positive outcomes may have been found in smaller studies in which hard-to-replicate levels of training or monitoring by program developers may have been possible, or in which measures made by developers or researchers were used, or where other study features made it easier to find positive outcomes.

The modest percentage of positive findings has caused some observers to question the value of all these rigorous studies. They wonder if this is a worthwhile investment of tax dollars.

One answer to this concern is to point out that while the percentage of all studies finding positive outcomes is modest, so many have been funded that the number of proven programs is growing rapidly. In our Evidence for ESSA website (www.evidenceforessa.org), we have found 111 programs that meet ESSA’s Strong, Moderate, or Promising standards in elementary and secondary reading or math. That’s a lot of proven programs, especially in elementary reading, where there were 62.

The situation is a bit like that in medicine. A very small percentage of rigorous studies of medicines or other treatments show positive effects. Yet so many are done that each year, new proven treatments for all sorts of diseases enter widespread use in medical practice. This dynamic is one explanation for the steady increases in life expectancy taking place throughout the world.

Further, high quality studies that fail to find positive outcomes also contribute to the science and practice of education. Some programs do not meet standards for statistical significance, but nevertheless they show promise overall or with particular subgroups. Programs that do not find clear positive outcomes but closely resemble other programs that do are another category worth further attention. Funders can take this into account in deciding whether to fund another study of programs that “just missed.”

On the other hand, there are programs that show profoundly zero impact, in categories that never or almost never find positive outcomes. I reported recently on benchmark assessments,  with an overall effect size of -0.01 across 10 studies. This might be a good candidate for giving up, unless someone has a markedly different approach unlike those that have failed so often. Another unpromising category is textbooks. Textbooks may be necessary, but the idea that replacing one textbook with another has failed many, many times. This set of negative results can be helpful to schools, enabling them to focus their resources on programs that do work. But giving up on categories of studies that hardly ever work would significantly reduce the 80% failure rate, and save money better spent on evaluating more promising approaches.

The findings of many studies of replicable programs can also reveal patterns that should help current or future developers create programs that meet modern standards of evidence. There are a few patterns I’ve seen across many programs and studies:

  1. I think developers (and funders) vastly underestimate the amount and quality of professional development needed to bring about significant change in teacher behaviors and student outcomes. Strong professional development requires top-quality initial training, including simulations and/or videos to show teachers how a program works, not just tell them. Effective PD almost always includes coaching visits to classrooms to give teachers feedback and new ideas. If teachers fall back into their usual routines due to insufficient training and follow-up coaching, why would anyone expect their students’ learning to improve in comparison to the outcomes they’ve always gotten? Adequate professional development can be expensive, but this cost is highly worthwhile if it improves outcomes.
  2. In successful programs, professional development focuses on classroom practices, not solely on improving teachers’ knowledge of curriculum or curriculum-specific pedagogy. Teachers standing at the front of the class using the same forms of teaching they’ve always used but doing it with more up-to-date or better-aligned content are not likely to significantly improve student learning. In contrast, professional development focused on tutoring, cooperative learning, and classroom management has a far better track record.
  3. Programs that focus on motivation and relationships between teachers and students and among students are more likely to enhance achievement than programs that focus on cognitive growth alone. Successful teaching focuses on students’ hearts and spirits, not just their minds.
  4. You can’t beat tutoring. Few approaches other than one-to-one or one-to-small group tutoring have consistent powerful impacts. There is much to learn about how to make tutoring maximally effective and cost-effective, but let’s start with the most effective and cost-effective tutoring models we have now and build out from there .
  5. Many, perhaps most failed program evaluations involve approaches with great potential (or great success) in commercial applications. This is one reason that so many evaluations fail; they assess textbooks or benchmark assessments or ordinary computer assisted instruction approaches. These often involve little professional development or follow-up, and they may not make important changes in what teachers do. Real progress in evidence-based reform will begin when publishers and software developers come to believe that only proven programs will succeed in the marketplace. When that happens, vast non-governmental resources will be devoted to development, evaluation, and dissemination of well-implemented forms of proven programs. Medicine was once dominated by the equivalent of Dr. Good’s Universal Elixir (mostly good-tasting alcohol and sugar). Very cheap, widely marketed, and popular, but utterly useless. However, as government began to demand evidence for medical claims, Dr. Good gave way to Dr. Proven.

Because of long-established policies and practices that have transformed medicine, agriculture, technology, and other fields, we know exactly what has to be done. IES, i3/EIR, and EEF are doing it, and showing great progress. This is not the time to get cold feet over the 80% failure rate. Instead, it is time to celebrate the fabulous 20% – programs that have succeeded in rigorous evaluations. Then we need to increase investments in evaluations of the most promising approaches.

 

 

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Moneyball for Education

When I was a kid, growing up in the Maryland suburbs of Washington, DC, everyone I knew rooted for the hapless Washington Senators, one of the worst baseball teams ever. At that time, however, the Baltimore Orioles were one of the best teams in baseball, and every once in a while a classmate would snap. He (always “he”) would decide to become an Orioles fan. This would cause him to be shamed and ostracized for the rest of his life by all true Senators fans.

I’ve now lived in Baltimore for most of my life. I wonder if I came here in part because of my youthful impression of Baltimore as a winning franchise?

blog_3-14-19_moneyball_500x435

Skipping forward in time to now, I recently saw in the New York Times an article about the collapse of the Baltimore Orioles. In 2018, they had the worst record of any team in history. Worse than even the Washington Senators ever were. Why did this happen? According to the NYT, the Orioles are one of the last teams to embrace analytics, which means using evidence to decide which players to recruit or drop, to put on the field or on the bench. Some teams have analytics departments of 15. The Orioles? Zero, although they have just started one.

It’s not as though the benefits of analytics are a secret. A 2003 book by Michael Lewis, Moneyball, explained how the underfunded Oakland As used analytics to turn themselves around. A hugely popular 2011 movie told the same story.

In case anyone missed the obvious linkage of analytics in baseball to analytics in education, Results for America (RfA), a group that promotes the use of evidence in government social programs, issued a 2015 book called, you guessed it, Moneyball for Government (Nussle & Orszag, 2015). This Moneyball focused on success stories and ideas from key thinkers and practitioners in government and education. RfA was instrumental in encouraging the U.S. Congress to include in ESSA definitions of strong, moderate, and promising evidence of effectiveness, and to specify a few areas of federal funding that require or incentivize use of proven programs.

The ESSA evidence standards are a giant leap forward in supporting the use of evidence in education. Yet, like the Baltimore Orioles, the once-admired U.S. education system has been less than swept away by the idea that using proven programs and practices could improve outcomes for children. Yes, the situation is better than it was, but things are going very slowly. I’m worried that because of this, the whole evidence movement in education will someday be dismissed: “Evidence? Yeah, we tried that. Didn’t work.”

There are still good reasons for hope. The amount of high-quality evidence continues to grow at an unprecedented pace. The ESSA evidence standards have at least encouraged federal, state, and local leaders to pay some attention to evidence, though moving to action based on this evidence is a big lift.

Perhaps I’m just impatient. It took the Baltimore Orioles a book, a movie, and 16 years to arrive at the conclusion that maybe, just maybe, it was time to use evidence, as winning teams have been doing for a long time. Education is much bigger, and its survival does not depend on its success (as baseball teams do). Education will require visionary leadership to embrace the use of evidence. But I am confident that when it does, we will be overwhelmed by visits from educators from Finland, Singapore, China, and other countries that currently clobber us in international comparisons. They’ll want to know how the U.S. education system became the best in the world. Perhaps we’ll have to write a book and a movie to explain it all.  I’d suggest we call it . . . “Learnball.”

References

Nussle, J., & Orszag, P. (2015). Moneyball for Government (2nd Ed.). Washington, DC: Disruption Books.

Photo credit: Keith Allison [CC BY-SA 2.0 (https://creativecommons.org/licenses/by-sa/2.0)]

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.