The Fabulous 20%: Programs Proven Effective in Rigorous Research

blog_4-18-19_girlcheer_500x333
Photo courtesy of Allison Shelley/The Verbatim Agency for American Education: Images of Teachers and Students in Action

Over the past 15 years, governments in the U.S. and U.K. have put quite a lot of money (by education standards) into rigorous research on promising programs in PK-12 instruction. Rigorous research usually means studies in which schools, teachers, or students are assigned at random to experimental or control conditions and then pre- and posttested on valid measures independent of the developers. In the U.S., the Institute for Education Sciences (IES) and Investing in Innovation (i3), now called Education Innovation Research (EIR), have led this strategy, and in the U.K., it’s the Education Endowment Foundation (EEF). Enough research has now been done to enable us to begin to see important patterns in the findings.

One finding that is causing some distress is that the numbers of studies showing significant positive effects is modest. Across all funding programs, the proportion of studies reporting positive, significant findings averages around 20%. It is important to note that most funded projects evaluate programs that have been newly developed and not previously evaluated. The “early phase” or “development” category of i3/EIR is a good example; it provides small grants intended to fund creation or refinement of new programs, so it is not so surprising that these studies are less likely to find positive outcomes. However, even programs that have been successfully evaluated in the past often do not replicate their positive findings in the large, rigorous evaluations required at the higher levels of i3/EIR and IES, and in all full-scale EEF studies. The problem is that positive outcomes may have been found in smaller studies in which hard-to-replicate levels of training or monitoring by program developers may have been possible, or in which measures made by developers or researchers were used, or where other study features made it easier to find positive outcomes.

The modest percentage of positive findings has caused some observers to question the value of all these rigorous studies. They wonder if this is a worthwhile investment of tax dollars.

One answer to this concern is to point out that while the percentage of all studies finding positive outcomes is modest, so many have been funded that the number of proven programs is growing rapidly. In our Evidence for ESSA website (www.evidenceforessa.org), we have found 111 programs that meet ESSA’s Strong, Moderate, or Promising standards in elementary and secondary reading or math. That’s a lot of proven programs, especially in elementary reading, where there were 62.

The situation is a bit like that in medicine. A very small percentage of rigorous studies of medicines or other treatments show positive effects. Yet so many are done that each year, new proven treatments for all sorts of diseases enter widespread use in medical practice. This dynamic is one explanation for the steady increases in life expectancy taking place throughout the world.

Further, high quality studies that fail to find positive outcomes also contribute to the science and practice of education. Some programs do not meet standards for statistical significance, but nevertheless they show promise overall or with particular subgroups. Programs that do not find clear positive outcomes but closely resemble other programs that do are another category worth further attention. Funders can take this into account in deciding whether to fund another study of programs that “just missed.”

On the other hand, there are programs that show profoundly zero impact, in categories that never or almost never find positive outcomes. I reported recently on benchmark assessments,  with an overall effect size of -0.01 across 10 studies. This might be a good candidate for giving up, unless someone has a markedly different approach unlike those that have failed so often. Another unpromising category is textbooks. Textbooks may be necessary, but the idea that replacing one textbook with another has failed many, many times. This set of negative results can be helpful to schools, enabling them to focus their resources on programs that do work. But giving up on categories of studies that hardly ever work would significantly reduce the 80% failure rate, and save money better spent on evaluating more promising approaches.

The findings of many studies of replicable programs can also reveal patterns that should help current or future developers create programs that meet modern standards of evidence. There are a few patterns I’ve seen across many programs and studies:

  1. I think developers (and funders) vastly underestimate the amount and quality of professional development needed to bring about significant change in teacher behaviors and student outcomes. Strong professional development requires top-quality initial training, including simulations and/or videos to show teachers how a program works, not just tell them. Effective PD almost always includes coaching visits to classrooms to give teachers feedback and new ideas. If teachers fall back into their usual routines due to insufficient training and follow-up coaching, why would anyone expect their students’ learning to improve in comparison to the outcomes they’ve always gotten? Adequate professional development can be expensive, but this cost is highly worthwhile if it improves outcomes.
  2. In successful programs, professional development focuses on classroom practices, not solely on improving teachers’ knowledge of curriculum or curriculum-specific pedagogy. Teachers standing at the front of the class using the same forms of teaching they’ve always used but doing it with more up-to-date or better-aligned content are not likely to significantly improve student learning. In contrast, professional development focused on tutoring, cooperative learning, and classroom management has a far better track record.
  3. Programs that focus on motivation and relationships between teachers and students and among students are more likely to enhance achievement than programs that focus on cognitive growth alone. Successful teaching focuses on students’ hearts and spirits, not just their minds.
  4. You can’t beat tutoring. Few approaches other than one-to-one or one-to-small group tutoring have consistent powerful impacts. There is much to learn about how to make tutoring maximally effective and cost-effective, but let’s start with the most effective and cost-effective tutoring models we have now and build out from there .
  5. Many, perhaps most failed program evaluations involve approaches with great potential (or great success) in commercial applications. This is one reason that so many evaluations fail; they assess textbooks or benchmark assessments or ordinary computer assisted instruction approaches. These often involve little professional development or follow-up, and they may not make important changes in what teachers do. Real progress in evidence-based reform will begin when publishers and software developers come to believe that only proven programs will succeed in the marketplace. When that happens, vast non-governmental resources will be devoted to development, evaluation, and dissemination of well-implemented forms of proven programs. Medicine was once dominated by the equivalent of Dr. Good’s Universal Elixir (mostly good-tasting alcohol and sugar). Very cheap, widely marketed, and popular, but utterly useless. However, as government began to demand evidence for medical claims, Dr. Good gave way to Dr. Proven.

Because of long-established policies and practices that have transformed medicine, agriculture, technology, and other fields, we know exactly what has to be done. IES, i3/EIR, and EEF are doing it, and showing great progress. This is not the time to get cold feet over the 80% failure rate. Instead, it is time to celebrate the fabulous 20% – programs that have succeeded in rigorous evaluations. Then we need to increase investments in evaluations of the most promising approaches.

 

 

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Advertisements

Moneyball for Education

When I was a kid, growing up in the Maryland suburbs of Washington, DC, everyone I knew rooted for the hapless Washington Senators, one of the worst baseball teams ever. At that time, however, the Baltimore Orioles were one of the best teams in baseball, and every once in a while a classmate would snap. He (always “he”) would decide to become an Orioles fan. This would cause him to be shamed and ostracized for the rest of his life by all true Senators fans.

I’ve now lived in Baltimore for most of my life. I wonder if I came here in part because of my youthful impression of Baltimore as a winning franchise?

blog_3-14-19_moneyball_500x435

Skipping forward in time to now, I recently saw in the New York Times an article about the collapse of the Baltimore Orioles. In 2018, they had the worst record of any team in history. Worse than even the Washington Senators ever were. Why did this happen? According to the NYT, the Orioles are one of the last teams to embrace analytics, which means using evidence to decide which players to recruit or drop, to put on the field or on the bench. Some teams have analytics departments of 15. The Orioles? Zero, although they have just started one.

It’s not as though the benefits of analytics are a secret. A 2003 book by Michael Lewis, Moneyball, explained how the underfunded Oakland As used analytics to turn themselves around. A hugely popular 2011 movie told the same story.

In case anyone missed the obvious linkage of analytics in baseball to analytics in education, Results for America (RfA), a group that promotes the use of evidence in government social programs, issued a 2015 book called, you guessed it, Moneyball for Government (Nussle & Orszag, 2015). This Moneyball focused on success stories and ideas from key thinkers and practitioners in government and education. RfA was instrumental in encouraging the U.S. Congress to include in ESSA definitions of strong, moderate, and promising evidence of effectiveness, and to specify a few areas of federal funding that require or incentivize use of proven programs.

The ESSA evidence standards are a giant leap forward in supporting the use of evidence in education. Yet, like the Baltimore Orioles, the once-admired U.S. education system has been less than swept away by the idea that using proven programs and practices could improve outcomes for children. Yes, the situation is better than it was, but things are going very slowly. I’m worried that because of this, the whole evidence movement in education will someday be dismissed: “Evidence? Yeah, we tried that. Didn’t work.”

There are still good reasons for hope. The amount of high-quality evidence continues to grow at an unprecedented pace. The ESSA evidence standards have at least encouraged federal, state, and local leaders to pay some attention to evidence, though moving to action based on this evidence is a big lift.

Perhaps I’m just impatient. It took the Baltimore Orioles a book, a movie, and 16 years to arrive at the conclusion that maybe, just maybe, it was time to use evidence, as winning teams have been doing for a long time. Education is much bigger, and its survival does not depend on its success (as baseball teams do). Education will require visionary leadership to embrace the use of evidence. But I am confident that when it does, we will be overwhelmed by visits from educators from Finland, Singapore, China, and other countries that currently clobber us in international comparisons. They’ll want to know how the U.S. education system became the best in the world. Perhaps we’ll have to write a book and a movie to explain it all.  I’d suggest we call it . . . “Learnball.”

References

Nussle, J., & Orszag, P. (2015). Moneyball for Government (2nd Ed.). Washington, DC: Disruption Books.

Photo credit: Keith Allison [CC BY-SA 2.0 (https://creativecommons.org/licenses/by-sa/2.0)]

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Don’t Just Do Something. Do Something Effective.

I recently visited York, England, where my wife and I worked part-time for about 8 years. York is world famous for its huge cathedral, intact medieval walls, medieval churches, and other medieval sights. But on this trip we had some time for local touring, and chose to visit a more modern place, but one far ghastlier than a ton of dungeons.

The place is the York Cold War Bunker. Built in 1961 and operated to 1991, it was intended to monitor the results of a nuclear attack on Britain. Volunteers, mostly women, were trained to detect the locations, sizes, and radiation levels of nuclear bombs dropped on Britain. This was a command bunker that collected its own data, with a staff of 60, but also monitored dozens of three-man bunkers all over the North of England, all collecting similar data. The idea was that a national network of these bunkers would determine where in the country it was safe to go after a nuclear war. The bunker had air, water, and food for 30 days, after which the volunteers had to leave. And most likely die of radiation poisoning.

blog_2-28-19_yorkbunker_500x332

The very interesting docent informed us of one astounding fact. When the bunker network was planned in 1957, the largest nuclear weapons were like those used in Hiroshima and Nagasaki, less than one megaton in yield. By 1961, when the bunkers started operation, the largest bombs were 50-megaton behemoths.

The day the Soviet Union successfully tested its 50-megaton bomb, the bunkers were instantly obsolete. Not only would a single bomb create fatal levels of radiation all over Britain, but it would also likely destroy the telephone and radio systems on which the bunkers depended.

Yet for 30 years, this utterly useless system was maintained, with extensive training, monitoring, and support.

There must have been thousands of military leaders, politicians, scientists, and ordinary readers of Popular Science, who knew full well that the bunkers were useless from the day they opened. The existence of the bunkers was not a secret, and in fact it was publicized. Why were they maintained? And what does this have to do with educational research?

The Cold War Bunkers illustrate an aspect of human nature that is important in understanding all sorts of behavior. When a catastrophe is impending, people find it comforting to do something, even if that something is known (by some at least) to be useless or even counterproductive. The British government could simply not say to its citizens that in case of a nuclear war, everyone was toast. Full stop. Instead, they had to offer hope, however slim. Around the same time the (doomed) bunkers were going into operation in Britain, my entire generation of students was learning to crawl under our desks for protection in case of nuclear attack. I suppose it made some people think that, well, at least something was being done. It scared the bejabbers out of us kids, but no one asked us.

In education, we face many very difficult, often terrifying problems. Every one of them has one or more widespread solutions. But do these solutions work?

Consider DARE, for Drug Awareness and Resistance Education, a well-researched example of what might be called “do-something-itis.” Research on DARE has never found positive effects on drug or alcohol abuse, and sometimes finds negative effects. In the case of DARE, there are many alternative drug and alcohol prevention programs that have been proven effective. Yet DARE continues, giving concerned educators and parents a comforting sense that something is being done to prevent drug and alcohol abuse among their teenagers.

Another good example of “do-something-itis” is benchmark assessments, where students take brief versions of their state tests 4-5 times a year, to give teachers and principals early warnings about areas in which students might be lagging or need additional, targeted assistance. This sounds like a simple, obvious strategy to improve test scores. However, in our reviews of research on studies of elementary and secondary reading and elementary mathematics, the effects of using benchmark assessments average an effect size close to 0.00. Yet I’m sure that schools will still be using benchmark assessments for many years, because with all the importance placed on state tests, educators will always feel better doing something focused on the problem. Of course, they should do something, actually quite a lot, but why not use “somethings” proven to work instead of benchmark assessments proven not to work?

In education, there are many very serious problems, and, in response, each one is given a solution that seems to address it. Often, the solutions are unresearched, or researched and found to be ineffective. A unifying attribute of these solutions is that they are simple and easy to understand, so most people are satisfied that at least something is being done. One example is the many states that threaten to retain third graders if they are not reading adequately (typically, at “proficient” levels on state tests) to address the serious gaps in literacy in the high school. Yet in most states, the programs used to improve student reading in grades K-3 are not proven to be effective. Often, the solution provided is a single reading teacher to provide one-to-one tutoring to students in K-3. One-to-one tutoring is very effective for the students who get it, but an average U.S. school has 280 students in grades K-3, about half of whom (on average) are unlikely to score proficient at third grade. Obviously, one tutor working one-to-one cannot do much for 140 students. Again, there are effective and cost-effective alternatives, such as proven one-to-small group tutoring by teaching assistants, but few states or schools use proven strategies of this kind.

I could go on, but I’m sure you get the idea. School systems can be seen as a huge network of dedicated people working very hard to accomplish crucial goals. Sort of like Cold War Bunkers. Yet many of their resources, talents, and efforts are underutilized, because most school systems insist on using programs and practices that appear to be doing something to prevent or solve major problems, but that have not been proven to do so.

It is time for our field to begin to focus the efforts and abilities of its talented, hard-working teachers and principals on solutions that are not just doing something, but are doing something effective. Every year, research identifies more and more effective programs known to work from rigorous experiments. This research progressively undermines the argument that doing something is at least better than doing nothing in the face of serious problems. In most areas of education, doing nothing is not the relevant option. If we do know how to solve these problems, then the alternative to doing something (of unknown value) is not doing nothing. Instead, the cure for do-something-itis is doing something that works.

Photo credit: Nilfanion [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)]

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Systems

What came first? The can or the can opener?

The answer to this age-old question is that the modern can and can opener were invented at exactly the same moment. This had to be true because a can without a can opener (yes, they existed) is of very little value, and a can opener without a can is the sound of one hand clapping (i.e., less than worthless).

The can and the can opener are together a system. Between them, they make it possible to preserve, transport, and distribute foods.

blog_2-7-19_canopening_333x500

In educational innovation, we frequently talk as though individual variables are sufficient to improve student achievement. You hear things like “more time-good,” “more technology-good,” and so on. Any of these factors can be effective as part of a system of innovations, or useless or harmful without other aligned components. As one example, consider time. A recent Florida study provided an extra hour each day for reading instruction, 180 hours over the course of a year, at a cost per student of $800 per student, or $300,000-$400,000 per school. The effect on reading performance, compared to schools that did not receive additional time, was very small (effect size =+0.09). In contrast, time used for one-to-one or one-to-small group tutoring by teaching assistants for example, can have a much larger impact on reading in elementary schools (effect size=+0.29), at about half the cost. As a system, cost-effective tutoring requires a coordinated combination of time, training for teaching assistants, use of proven materials, and monitoring of progress. Separately, each of these factors is nowhere near as effective as all of them taken together in a coordinated system. Each is a can with no can opener, or a can opener with no can: The sound of one hand clapping. Together, they can be very effective.

The importance of systems explains why programs are so important. Programs invariably combine individual elements to attempt to improve student outcomes. Not all programs are effective, of course, but those that have been proven to work have hit upon a balanced combination of instructional methods, classroom organization, professional development, technology, and supportive materials that, if implemented together with care and attention, have been proven to work. The opposite of a program is a “variable,” such as “time” or “technology,” that educators try to use with few consistent, proven links to other elements.

All successful human enterprises, such as schools, involve many individual variables. Moving these enterprises forward in effectiveness can rarely be done by changing one variable. Instead, we have to design coordinated plans to improve outcomes. A can opener can’t, a can can’t, but together, a can opener and a can can.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Replication

The holy grail of science is replication. If a finding cannot be repeated, then it did not happen in the first place. There is a reason that the humor journal in the hard sciences is called the Journal of Irreproducible Results. For scientists, results that are irreproducible are inherently laughable, therefore funny. In many hard science experiments, replication is pretty much guaranteed. If you heat an iron bar, it gets longer. If you cross parents with the same recessive gene, one quarter of their progeny will express the recessive trait (think blue eyes).

blog_1-24-19_bunnies_500x363

In educational research, we care about replication just as much as our colleagues in the lab coats across campus. However, when we’re talking about evaluating instructional programs and practices, replication is a lot harder, because students and schools differ. Positive outcomes obtained in one experiment may or may not replicate in a second trial. Sometimes this is true because the first experiment had features known to contribute to bias: small sample sizes, brief study durations, extraordinary amounts of resources or expert time to help the experimental schools or classes, use of measures made by the developers or researchers or otherwise overaligned with the experimental group (but not the control group), or use of matched rather than randomized assignment to conditions, can all contribute to successful-appearing outcomes in a first experiment. Second or third experiments are more likely to be larger, longer, and more stringent than the first study, and therefore may not replicate. Even when the first study has none of these problems, it may not replicate because of differences in the samples of schools, teachers, or students, or for other, perhaps unknowable problems. A change in the conditions of education may cause a failure to replicate. Our Success for All whole-school reform model has been found to be effective many times, mostly by third party evaluators. However, Success for All has always specified a full-time facilitator and at least one tutor for each school. An MDRC i3 evaluation happened to fall in the middle of the recession, and schools, which were struggling to afford classroom teachers, could not afford facilitators or tutors. The results were still positive on some measures, especially for low achievers, but the effect sizes were less than half of what others had found in many studies. Stuff happens.

Replication has taken on more importance recently because the ESSA evidence standards only require a single positive study. To meet the strong, moderate, or promising standards, programs must have at least one “well-designed and well-implemented” study using randomized (strong), matched (moderate), or correlational (promising) designs and finding significantly positive outcomes. Based on the “well-designed and well-implemented” language, our Evidence for ESSA website requires features of experiments similar to those also required by the What Works Clearinghouse (WWC). These requirements make it difficult to be approved, but they remove many of the experimental design features that typically cause first studies to greatly overstate program impacts: small size, brief durations, overinvolved experimenters, and developer-made measures. They put (less rigorous) matched and correlational studies in lower categories. So one study that meets ESSA or Evidence for ESSA requirements is at least likely to be a very good study. But many researchers have expressed discomfort with the idea that a single study could qualify a program for one of the top ESSA categories, especially if (as sometimes happens) there is one study with a positive outcomes and many with zero or at least nonsignificant outcomes.

The pragmatic problem is that if ESSA had required even two studies showing positive outcomes, this would wipe out a very large proportion of current programs. If research continues to identify effective programs, it should only be a matter of time before ESSA (or its successors) requires more than one study with a positive outcomes.

However, in the current circumstance, there is a way researchers and educators might at least estimate the replicability of given programs when they have only a single study with a significant positive outcomes. This would involve looking at the findings for entire genres of programs. The logic here is that if a program has only one ESSA-qualifying study, but it closely resembles other programs that also have positive outcomes, that program should be taken a lot more seriously than a program that obtained a positive outcome that differs considerably from outcomes of very similar programs.

As one example, there is much evidence from many studies by many researchers indicating positive effects of one-to-one and one-to-small group tutoring, in reading and mathematics. If a tutoring program has only one study, but this one study has significant positive findings, I’d say thumbs up. I’d say the same about cooperative learning approaches, classroom management strategies using behavioral principles, and many others, where a whole category of programs has had positive outcomes.

In contrast, if a program has a single positive outcome and there are few if any similar approaches that obtained positive outcomes, I’d be much more cautious. An example might be textbooks in mathematics, which rarely make any difference because control groups are also likely to be using textbooks, and textbooks considerably resemble each other. In our recent elementary mathematics review (Pellegrini, Lake, Inns, & Slavin, 2018), only one textbook program available in the U.S. had positive outcomes (out of 16 studies). As another example, there have been several large randomized evaluations of the use of interim assessments. Only one of them found positive outcomes. I’d be very cautious about putting much faith in benchmark assessments based on this single anomalous finding.

Looking for findings from similar studies is facilitated by looking at reviews we make available at www.bestevidence.org. These consist of reviews of research organized by categories of programs. Looking for findings from similar programs won’t help with the ESSA law, which often determines its ratings based on the findings of a single study, regardless of other findings on the same program or similar programs. However, for educators and researchers who really want to find out what works, I think checking similar programs is not quite as good as finding direct replication of positive findings on the same programs, but perhaps, as we like to say, close enough for social science.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Tutoring Works. But Let’s Learn How It Can Work Better and Cheaper

I was once at a meeting of the British Education Research Association, where I had been invited to participate in a debate about evidence-based reform. We were having what journalists often call “a frank exchange of views” in a room packed to the rafters.

At one point in the proceedings, a woman stood up and, in a furious tone of voice, informed all and sundry that (I’m paraphrasing here) “we don’t need to talk about all this (very bad word). Every child should just get Reading Recovery.” She then stomped out.

I don’t know how widely her view was supported in the room or anywhere else in Britain or elsewhere, but what struck me at the time, and what strikes even more today, is the degree to which Reading Recovery has long defined, and in many ways limited, discussions about tutoring. Personally, I have nothing against Reading Recovery, and I have always admired the commitment Reading Recovery advocates have had to professional development and to research. I’ve also long known that the evidence for Reading Recovery is very impressive, but you’d be amazed if one-to-one tutoring by well-trained teachers did not produce positive outcomes. On the other hand, Reading Recovery insists on one-to-one instruction by certified teachers with a lot of cost for all that admirable professional development, so it is very expensive. A British study estimated the cost per child at $5400 (in 2018 dollars). There are roughly one million Year 1 students in the U.K., so if the angry woman had her way, they’d have to come up with the equivalent of $5.4 billion a year. In the U.S., it would be more like $27 billion a year. I’m not one to shy away from very expensive proposals if they provide also extremely effective services and there are no equally effective alternatives. But shouldn’t we be exploring alternatives?

If you’ve been following my blogs on tutoring, you’ll be aware that, at least at the level of research, the Reading Recovery monopoly on tutoring has been broken in many ways. Reading Recovery has always insisted on certified teachers, but many studies have now shown that well-trained teaching assistants can do just as well, in mathematics as well as reading. Reading Recovery has insisted that tutoring should just be for first graders, but numerous studies have now shown positive outcomes of tutoring through seventh grade, in both reading and mathematics. Reading Recovery has argued that its cost was justified by the long-lasting impacts of first-grade tutoring, but their own research has not documented long-lasting outcomes. Reading Recovery is always one-to-one, of course, but now there are numerous one-to-small group programs, including a one-to-three adaptation of Reading Recovery itself, that produce very good effects. Reading Recovery has always just been for reading, but there are now more than a dozen studies showing positive effects of tutoring in math, too.

blog_12-20-18_tutornkid_500x333

All of this newer evidence opens up new possibilities for tutoring that were unthinkable when Reading Recovery ruled the tutoring roost alone. If tutoring can be effective using teaching assistants and small groups, then it is becoming a practicable solution to a much broader range of learning problems. It also opens up a need for further research and development specific to the affordances and problems of tutoring. For example, tutoring can be done a lot less expensively than $5,400 per child, but it is still expensive. We created and evaluated a one-to-six, computer-assisted tutoring model that produced effect sizes of around +0.40 for $500 per child. Yet I just got a study from the Education Endowment Fund (EEF) in England evaluating one-to-three math tutoring by college students and recent graduates. They only provided tutoring one hour per week for 12 weeks, to sixth graders. The effect size was much smaller (ES=+0.19), but the cost was only about $150 per child.

I am not advocating this particular solution, but isn’t it interesting? The EEF also evaluated another means of making tutoring inexpensive, using online tutors from India and Sri Lanka, and another, using cross-age peer tutors, both in math. Both failed miserably, but isn’t that interesting?

I can imagine a broad range of approaches to tutoring, designed to enhance outcomes, minimize costs, or both. Out of that research might come a diversity of approaches that might be used for different purposes. For example, students in deep trouble, headed for special education, surely need something different from what is needed by students with less serious problems. But what exactly is it that is needed in each situation?

In educational research, reliable positive effects of any intervention are rare enough that we’re usually happy to celebrate anything that works. We might say, “Great, tutoring works! But we knew that.”  However, if tutoring is to become a key part of every school’s strategies to prevent or remediate learning problems, then knowing that “tutoring works” is not enough. What kind of tutoring works for what purposes?  Can we use technology to make tutors more effective? How effective could tutoring be if it is given all year or for multiple years? Alternatively, how effective could we make small amounts of tutoring? What is the optimal group size for small group tutoring?

We’ll never satisfy the angry woman who stormed out of my long-ago symposium at BERA. But for those who can have an open mind about the possibilities, building on the most reliable intervention we have for struggling learners and creating and evaluating effective and cost-effective tutoring approaches seems like a worthwhile endeavor.

Photo Courtesy of Allison Shelley/The Verbatim Agency for American Education: Images of Teachers and Students in Action.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Miss Evers’ Boys (And Girls)

Most people who have ever been involved with human subjects’ rights know about the Tuskegee Syphilis Study. This was a study of untreated syphilis, in which 622 poor, African American sharecroppers, some with syphilis and some without, were evaluated over 40 years.

The study, funded and overseen by the U.S. Public Health Service, started in 1932. In 1940, researchers elsewhere discovered that penicillin cured syphilis. By 1947, penicillin was “standard of care” for syphilis, meaning that patients with syphilis received penicillin as a matter of course, anywhere in the U.S.

But not in Tuskegee. Not in 1940. Not in 1947. Not until 1972, when a whistle-blower made the press aware of what was happening. In the meantime, many of the men died of syphilis, 40 of their wives contracted the disease, and 19 of their children were born with congenital syphilis. The men had never even been told the nature of the study, they were not informed in 1940 or 1947 that there was now a cure, and they were not offered that cure. Leaders of the U.S. Public Health Service were well aware that there was a cure for syphilis, but for various reasons, they did not stop the study. Not in 1940, not in 1947, not even when whistle-blowers told them what was going on. They stopped it only when the press found out.

blog_11-1-18_tuskegee_500x363

In 1997 a movie on the Tuskegee Syphilis Study was released. It was called Miss Evers’ Boys. Miss Evers (actually, Eunice Rivers) was the African-American public health nurse who was the main point of contact for the men over the whole 40 years. She deeply believed that she, and the study, were doing good for the men and their community, and she formed close relationships with them. She believed in the USPHS leadership, and thought they would never harm her “boys.”

The Tuskegee study was such a crime and scandal that it utterly changed procedures for medical research in the U.S. and most of the world. Today, participants in research with any level of risk, or their parents if they are children, must give informed consent for participation in research, and even if they are in a control group, they must receive at least “standard of care”: currently accepted, evidence-based practices.

If you’ve read my blogs, you’ll know where I’m going with this. Failure to use proven educational treatments, unlike medical ones, is rarely fatal, at least not in the short term. But otherwise, our profession carries out Tuskegee crimes all the time. It condemns failing students to ineffective programs and practices when effective ones are known. It fails to even inform parents or children, much less teachers and principals, that proven programs exist: Proven, practical, replicable solutions for the problems they face every day.

Like Miss Rivers, front-line educators care deeply about their charges. Most work very hard and give their absolute best to help all of their children to succeed. Teaching is too much hard work and too little money for anyone to do it for any reason but for the love of children.

But somewhere up the line, where the big decisions are made, where the people are who know or who should know which programs and practices are proven to work and which are not, this information just does not matter. There are exceptions, real heroes, but in general, educational leaders who believe that schools should use proven programs have to fight hard for this position. The problem is that the vast majority of educational expenditures—textbooks, software, professional development, and so on—lack even a shred of evidence. Not a scintilla. Some have evidence that they do not work. Yet advocates for those expenditures (such as sales reps and educators who like the programs) argue strenuously for programs with no evidence, and it’s just easier to go along. Whole states frequently adopt or require textbooks, software, and services of no known value in terms of improving student achievement. The ESSA evidence standards were intended to focus educators on evidence and incentivize use of proven programs, at least for the lowest-achieving 5% of schools in each state, but so far it’s been slow going.

Yet there are proven alternatives. Evidence for ESSA (www.evidenceforessa.org) lists more than 100 PK-12 reading and math programs that meet the top three ESSA evidence standards. The majority meet the top level, “Strong.” And most of the programs were researched with struggling students. Yet I am not perceiving a rush to find out about proven programs. I am hearing a lot of new interest in evidence, but my suspicion, growing every day, is that many educational leaders do not really care about the evidence, but are instead just trying to find a way to keep using the programs and providers they already have and already like, and are looking for evidence to justify keeping things as they are.

Every school has some number of struggling students. If these children are provided with the same approaches that have not worked with them or with millions like them, it is highly likely that most will fail, with all the consequences that flow from school failure: Retention. Assignment to special education. Frustration. Low expectations. Dropout. Limited futures. Poverty. Unemployment. There are 50 million children in grades PK to 12 in the U.S. This is the grinding reality for perhaps 10 to 20 million of them. Solutions are readily available, but not known or used by caring and skilled front-line educators.

In what way is this situation unlike Tuskegee in 1940?

 Photo credit: By National Archives Atlanta, GA (U.S. government) ([1], originally from National Archives) [Public domain], via Wikimedia Commons

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.