Moneyball for Education

When I was a kid, growing up in the Maryland suburbs of Washington, DC, everyone I knew rooted for the hapless Washington Senators, one of the worst baseball teams ever. At that time, however, the Baltimore Orioles were one of the best teams in baseball, and every once in a while a classmate would snap. He (always “he”) would decide to become an Orioles fan. This would cause him to be shamed and ostracized for the rest of his life by all true Senators fans.

I’ve now lived in Baltimore for most of my life. I wonder if I came here in part because of my youthful impression of Baltimore as a winning franchise?

blog_3-14-19_moneyball_500x435

Skipping forward in time to now, I recently saw in the New York Times an article about the collapse of the Baltimore Orioles. In 2018, they had the worst record of any team in history. Worse than even the Washington Senators ever were. Why did this happen? According to the NYT, the Orioles are one of the last teams to embrace analytics, which means using evidence to decide which players to recruit or drop, to put on the field or on the bench. Some teams have analytics departments of 15. The Orioles? Zero, although they have just started one.

It’s not as though the benefits of analytics are a secret. A 2003 book by Michael Lewis, Moneyball, explained how the underfunded Oakland As used analytics to turn themselves around. A hugely popular 2011 movie told the same story.

In case anyone missed the obvious linkage of analytics in baseball to analytics in education, Results for America (RfA), a group that promotes the use of evidence in government social programs, issued a 2015 book called, you guessed it, Moneyball for Government (Nussle & Orszag, 2015). This Moneyball focused on success stories and ideas from key thinkers and practitioners in government and education. RfA was instrumental in encouraging the U.S. Congress to include in ESSA definitions of strong, moderate, and promising evidence of effectiveness, and to specify a few areas of federal funding that require or incentivize use of proven programs.

The ESSA evidence standards are a giant leap forward in supporting the use of evidence in education. Yet, like the Baltimore Orioles, the once-admired U.S. education system has been less than swept away by the idea that using proven programs and practices could improve outcomes for children. Yes, the situation is better than it was, but things are going very slowly. I’m worried that because of this, the whole evidence movement in education will someday be dismissed: “Evidence? Yeah, we tried that. Didn’t work.”

There are still good reasons for hope. The amount of high-quality evidence continues to grow at an unprecedented pace. The ESSA evidence standards have at least encouraged federal, state, and local leaders to pay some attention to evidence, though moving to action based on this evidence is a big lift.

Perhaps I’m just impatient. It took the Baltimore Orioles a book, a movie, and 16 years to arrive at the conclusion that maybe, just maybe, it was time to use evidence, as winning teams have been doing for a long time. Education is much bigger, and its survival does not depend on its success (as baseball teams do). Education will require visionary leadership to embrace the use of evidence. But I am confident that when it does, we will be overwhelmed by visits from educators from Finland, Singapore, China, and other countries that currently clobber us in international comparisons. They’ll want to know how the U.S. education system became the best in the world. Perhaps we’ll have to write a book and a movie to explain it all.  I’d suggest we call it . . . “Learnball.”

References

Nussle, J., & Orszag, P. (2015). Moneyball for Government (2nd Ed.). Washington, DC: Disruption Books.

Photo credit: Keith Allison [CC BY-SA 2.0 (https://creativecommons.org/licenses/by-sa/2.0)]

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Advertisements

Don’t Just Do Something. Do Something Effective.

I recently visited York, England, where my wife and I worked part-time for about 8 years. York is world famous for its huge cathedral, intact medieval walls, medieval churches, and other medieval sights. But on this trip we had some time for local touring, and chose to visit a more modern place, but one far ghastlier than a ton of dungeons.

The place is the York Cold War Bunker. Built in 1961 and operated to 1991, it was intended to monitor the results of a nuclear attack on Britain. Volunteers, mostly women, were trained to detect the locations, sizes, and radiation levels of nuclear bombs dropped on Britain. This was a command bunker that collected its own data, with a staff of 60, but also monitored dozens of three-man bunkers all over the North of England, all collecting similar data. The idea was that a national network of these bunkers would determine where in the country it was safe to go after a nuclear war. The bunker had air, water, and food for 30 days, after which the volunteers had to leave. And most likely die of radiation poisoning.

blog_2-28-19_yorkbunker_500x332

The very interesting docent informed us of one astounding fact. When the bunker network was planned in 1957, the largest nuclear weapons were like those used in Hiroshima and Nagasaki, less than one megaton in yield. By 1961, when the bunkers started operation, the largest bombs were 50-megaton behemoths.

The day the Soviet Union successfully tested its 50-megaton bomb, the bunkers were instantly obsolete. Not only would a single bomb create fatal levels of radiation all over Britain, but it would also likely destroy the telephone and radio systems on which the bunkers depended.

Yet for 30 years, this utterly useless system was maintained, with extensive training, monitoring, and support.

There must have been thousands of military leaders, politicians, scientists, and ordinary readers of Popular Science, who knew full well that the bunkers were useless from the day they opened. The existence of the bunkers was not a secret, and in fact it was publicized. Why were they maintained? And what does this have to do with educational research?

The Cold War Bunkers illustrate an aspect of human nature that is important in understanding all sorts of behavior. When a catastrophe is impending, people find it comforting to do something, even if that something is known (by some at least) to be useless or even counterproductive. The British government could simply not say to its citizens that in case of a nuclear war, everyone was toast. Full stop. Instead, they had to offer hope, however slim. Around the same time the (doomed) bunkers were going into operation in Britain, my entire generation of students was learning to crawl under our desks for protection in case of nuclear attack. I suppose it made some people think that, well, at least something was being done. It scared the bejabbers out of us kids, but no one asked us.

In education, we face many very difficult, often terrifying problems. Every one of them has one or more widespread solutions. But do these solutions work?

Consider DARE, for Drug Awareness and Resistance Education, a well-researched example of what might be called “do-something-itis.” Research on DARE has never found positive effects on drug or alcohol abuse, and sometimes finds negative effects. In the case of DARE, there are many alternative drug and alcohol prevention programs that have been proven effective. Yet DARE continues, giving concerned educators and parents a comforting sense that something is being done to prevent drug and alcohol abuse among their teenagers.

Another good example of “do-something-itis” is benchmark assessments, where students take brief versions of their state tests 4-5 times a year, to give teachers and principals early warnings about areas in which students might be lagging or need additional, targeted assistance. This sounds like a simple, obvious strategy to improve test scores. However, in our reviews of research on studies of elementary and secondary reading and elementary mathematics, the effects of using benchmark assessments average an effect size close to 0.00. Yet I’m sure that schools will still be using benchmark assessments for many years, because with all the importance placed on state tests, educators will always feel better doing something focused on the problem. Of course, they should do something, actually quite a lot, but why not use “somethings” proven to work instead of benchmark assessments proven not to work?

In education, there are many very serious problems, and, in response, each one is given a solution that seems to address it. Often, the solutions are unresearched, or researched and found to be ineffective. A unifying attribute of these solutions is that they are simple and easy to understand, so most people are satisfied that at least something is being done. One example is the many states that threaten to retain third graders if they are not reading adequately (typically, at “proficient” levels on state tests) to address the serious gaps in literacy in the high school. Yet in most states, the programs used to improve student reading in grades K-3 are not proven to be effective. Often, the solution provided is a single reading teacher to provide one-to-one tutoring to students in K-3. One-to-one tutoring is very effective for the students who get it, but an average U.S. school has 280 students in grades K-3, about half of whom (on average) are unlikely to score proficient at third grade. Obviously, one tutor working one-to-one cannot do much for 140 students. Again, there are effective and cost-effective alternatives, such as proven one-to-small group tutoring by teaching assistants, but few states or schools use proven strategies of this kind.

I could go on, but I’m sure you get the idea. School systems can be seen as a huge network of dedicated people working very hard to accomplish crucial goals. Sort of like Cold War Bunkers. Yet many of their resources, talents, and efforts are underutilized, because most school systems insist on using programs and practices that appear to be doing something to prevent or solve major problems, but that have not been proven to do so.

It is time for our field to begin to focus the efforts and abilities of its talented, hard-working teachers and principals on solutions that are not just doing something, but are doing something effective. Every year, research identifies more and more effective programs known to work from rigorous experiments. This research progressively undermines the argument that doing something is at least better than doing nothing in the face of serious problems. In most areas of education, doing nothing is not the relevant option. If we do know how to solve these problems, then the alternative to doing something (of unknown value) is not doing nothing. Instead, the cure for do-something-itis is doing something that works.

Photo credit: Nilfanion [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)]

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Systems

What came first? The can or the can opener?

The answer to this age-old question is that the modern can and can opener were invented at exactly the same moment. This had to be true because a can without a can opener (yes, they existed) is of very little value, and a can opener without a can is the sound of one hand clapping (i.e., less than worthless).

The can and the can opener are together a system. Between them, they make it possible to preserve, transport, and distribute foods.

blog_2-7-19_canopening_333x500

In educational innovation, we frequently talk as though individual variables are sufficient to improve student achievement. You hear things like “more time-good,” “more technology-good,” and so on. Any of these factors can be effective as part of a system of innovations, or useless or harmful without other aligned components. As one example, consider time. A recent Florida study provided an extra hour each day for reading instruction, 180 hours over the course of a year, at a cost per student of $800 per student, or $300,000-$400,000 per school. The effect on reading performance, compared to schools that did not receive additional time, was very small (effect size =+0.09). In contrast, time used for one-to-one or one-to-small group tutoring by teaching assistants for example, can have a much larger impact on reading in elementary schools (effect size=+0.29), at about half the cost. As a system, cost-effective tutoring requires a coordinated combination of time, training for teaching assistants, use of proven materials, and monitoring of progress. Separately, each of these factors is nowhere near as effective as all of them taken together in a coordinated system. Each is a can with no can opener, or a can opener with no can: The sound of one hand clapping. Together, they can be very effective.

The importance of systems explains why programs are so important. Programs invariably combine individual elements to attempt to improve student outcomes. Not all programs are effective, of course, but those that have been proven to work have hit upon a balanced combination of instructional methods, classroom organization, professional development, technology, and supportive materials that, if implemented together with care and attention, have been proven to work. The opposite of a program is a “variable,” such as “time” or “technology,” that educators try to use with few consistent, proven links to other elements.

All successful human enterprises, such as schools, involve many individual variables. Moving these enterprises forward in effectiveness can rarely be done by changing one variable. Instead, we have to design coordinated plans to improve outcomes. A can opener can’t, a can can’t, but together, a can opener and a can can.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Replication

The holy grail of science is replication. If a finding cannot be repeated, then it did not happen in the first place. There is a reason that the humor journal in the hard sciences is called the Journal of Irreproducible Results. For scientists, results that are irreproducible are inherently laughable, therefore funny. In many hard science experiments, replication is pretty much guaranteed. If you heat an iron bar, it gets longer. If you cross parents with the same recessive gene, one quarter of their progeny will express the recessive trait (think blue eyes).

blog_1-24-19_bunnies_500x363

In educational research, we care about replication just as much as our colleagues in the lab coats across campus. However, when we’re talking about evaluating instructional programs and practices, replication is a lot harder, because students and schools differ. Positive outcomes obtained in one experiment may or may not replicate in a second trial. Sometimes this is true because the first experiment had features known to contribute to bias: small sample sizes, brief study durations, extraordinary amounts of resources or expert time to help the experimental schools or classes, use of measures made by the developers or researchers or otherwise overaligned with the experimental group (but not the control group), or use of matched rather than randomized assignment to conditions, can all contribute to successful-appearing outcomes in a first experiment. Second or third experiments are more likely to be larger, longer, and more stringent than the first study, and therefore may not replicate. Even when the first study has none of these problems, it may not replicate because of differences in the samples of schools, teachers, or students, or for other, perhaps unknowable problems. A change in the conditions of education may cause a failure to replicate. Our Success for All whole-school reform model has been found to be effective many times, mostly by third party evaluators. However, Success for All has always specified a full-time facilitator and at least one tutor for each school. An MDRC i3 evaluation happened to fall in the middle of the recession, and schools, which were struggling to afford classroom teachers, could not afford facilitators or tutors. The results were still positive on some measures, especially for low achievers, but the effect sizes were less than half of what others had found in many studies. Stuff happens.

Replication has taken on more importance recently because the ESSA evidence standards only require a single positive study. To meet the strong, moderate, or promising standards, programs must have at least one “well-designed and well-implemented” study using randomized (strong), matched (moderate), or correlational (promising) designs and finding significantly positive outcomes. Based on the “well-designed and well-implemented” language, our Evidence for ESSA website requires features of experiments similar to those also required by the What Works Clearinghouse (WWC). These requirements make it difficult to be approved, but they remove many of the experimental design features that typically cause first studies to greatly overstate program impacts: small size, brief durations, overinvolved experimenters, and developer-made measures. They put (less rigorous) matched and correlational studies in lower categories. So one study that meets ESSA or Evidence for ESSA requirements is at least likely to be a very good study. But many researchers have expressed discomfort with the idea that a single study could qualify a program for one of the top ESSA categories, especially if (as sometimes happens) there is one study with a positive outcomes and many with zero or at least nonsignificant outcomes.

The pragmatic problem is that if ESSA had required even two studies showing positive outcomes, this would wipe out a very large proportion of current programs. If research continues to identify effective programs, it should only be a matter of time before ESSA (or its successors) requires more than one study with a positive outcomes.

However, in the current circumstance, there is a way researchers and educators might at least estimate the replicability of given programs when they have only a single study with a significant positive outcomes. This would involve looking at the findings for entire genres of programs. The logic here is that if a program has only one ESSA-qualifying study, but it closely resembles other programs that also have positive outcomes, that program should be taken a lot more seriously than a program that obtained a positive outcome that differs considerably from outcomes of very similar programs.

As one example, there is much evidence from many studies by many researchers indicating positive effects of one-to-one and one-to-small group tutoring, in reading and mathematics. If a tutoring program has only one study, but this one study has significant positive findings, I’d say thumbs up. I’d say the same about cooperative learning approaches, classroom management strategies using behavioral principles, and many others, where a whole category of programs has had positive outcomes.

In contrast, if a program has a single positive outcome and there are few if any similar approaches that obtained positive outcomes, I’d be much more cautious. An example might be textbooks in mathematics, which rarely make any difference because control groups are also likely to be using textbooks, and textbooks considerably resemble each other. In our recent elementary mathematics review (Pellegrini, Lake, Inns, & Slavin, 2018), only one textbook program available in the U.S. had positive outcomes (out of 16 studies). As another example, there have been several large randomized evaluations of the use of interim assessments. Only one of them found positive outcomes. I’d be very cautious about putting much faith in benchmark assessments based on this single anomalous finding.

Looking for findings from similar studies is facilitated by looking at reviews we make available at www.bestevidence.org. These consist of reviews of research organized by categories of programs. Looking for findings from similar programs won’t help with the ESSA law, which often determines its ratings based on the findings of a single study, regardless of other findings on the same program or similar programs. However, for educators and researchers who really want to find out what works, I think checking similar programs is not quite as good as finding direct replication of positive findings on the same programs, but perhaps, as we like to say, close enough for social science.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Tutoring Works. But Let’s Learn How It Can Work Better and Cheaper

I was once at a meeting of the British Education Research Association, where I had been invited to participate in a debate about evidence-based reform. We were having what journalists often call “a frank exchange of views” in a room packed to the rafters.

At one point in the proceedings, a woman stood up and, in a furious tone of voice, informed all and sundry that (I’m paraphrasing here) “we don’t need to talk about all this (very bad word). Every child should just get Reading Recovery.” She then stomped out.

I don’t know how widely her view was supported in the room or anywhere else in Britain or elsewhere, but what struck me at the time, and what strikes even more today, is the degree to which Reading Recovery has long defined, and in many ways limited, discussions about tutoring. Personally, I have nothing against Reading Recovery, and I have always admired the commitment Reading Recovery advocates have had to professional development and to research. I’ve also long known that the evidence for Reading Recovery is very impressive, but you’d be amazed if one-to-one tutoring by well-trained teachers did not produce positive outcomes. On the other hand, Reading Recovery insists on one-to-one instruction by certified teachers with a lot of cost for all that admirable professional development, so it is very expensive. A British study estimated the cost per child at $5400 (in 2018 dollars). There are roughly one million Year 1 students in the U.K., so if the angry woman had her way, they’d have to come up with the equivalent of $5.4 billion a year. In the U.S., it would be more like $27 billion a year. I’m not one to shy away from very expensive proposals if they provide also extremely effective services and there are no equally effective alternatives. But shouldn’t we be exploring alternatives?

If you’ve been following my blogs on tutoring, you’ll be aware that, at least at the level of research, the Reading Recovery monopoly on tutoring has been broken in many ways. Reading Recovery has always insisted on certified teachers, but many studies have now shown that well-trained teaching assistants can do just as well, in mathematics as well as reading. Reading Recovery has insisted that tutoring should just be for first graders, but numerous studies have now shown positive outcomes of tutoring through seventh grade, in both reading and mathematics. Reading Recovery has argued that its cost was justified by the long-lasting impacts of first-grade tutoring, but their own research has not documented long-lasting outcomes. Reading Recovery is always one-to-one, of course, but now there are numerous one-to-small group programs, including a one-to-three adaptation of Reading Recovery itself, that produce very good effects. Reading Recovery has always just been for reading, but there are now more than a dozen studies showing positive effects of tutoring in math, too.

blog_12-20-18_tutornkid_500x333

All of this newer evidence opens up new possibilities for tutoring that were unthinkable when Reading Recovery ruled the tutoring roost alone. If tutoring can be effective using teaching assistants and small groups, then it is becoming a practicable solution to a much broader range of learning problems. It also opens up a need for further research and development specific to the affordances and problems of tutoring. For example, tutoring can be done a lot less expensively than $5,400 per child, but it is still expensive. We created and evaluated a one-to-six, computer-assisted tutoring model that produced effect sizes of around +0.40 for $500 per child. Yet I just got a study from the Education Endowment Fund (EEF) in England evaluating one-to-three math tutoring by college students and recent graduates. They only provided tutoring one hour per week for 12 weeks, to sixth graders. The effect size was much smaller (ES=+0.19), but the cost was only about $150 per child.

I am not advocating this particular solution, but isn’t it interesting? The EEF also evaluated another means of making tutoring inexpensive, using online tutors from India and Sri Lanka, and another, using cross-age peer tutors, both in math. Both failed miserably, but isn’t that interesting?

I can imagine a broad range of approaches to tutoring, designed to enhance outcomes, minimize costs, or both. Out of that research might come a diversity of approaches that might be used for different purposes. For example, students in deep trouble, headed for special education, surely need something different from what is needed by students with less serious problems. But what exactly is it that is needed in each situation?

In educational research, reliable positive effects of any intervention are rare enough that we’re usually happy to celebrate anything that works. We might say, “Great, tutoring works! But we knew that.”  However, if tutoring is to become a key part of every school’s strategies to prevent or remediate learning problems, then knowing that “tutoring works” is not enough. What kind of tutoring works for what purposes?  Can we use technology to make tutors more effective? How effective could tutoring be if it is given all year or for multiple years? Alternatively, how effective could we make small amounts of tutoring? What is the optimal group size for small group tutoring?

We’ll never satisfy the angry woman who stormed out of my long-ago symposium at BERA. But for those who can have an open mind about the possibilities, building on the most reliable intervention we have for struggling learners and creating and evaluating effective and cost-effective tutoring approaches seems like a worthwhile endeavor.

Photo Courtesy of Allison Shelley/The Verbatim Agency for American Education: Images of Teachers and Students in Action.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Miss Evers’ Boys (And Girls)

Most people who have ever been involved with human subjects’ rights know about the Tuskegee Syphilis Study. This was a study of untreated syphilis, in which 622 poor, African American sharecroppers, some with syphilis and some without, were evaluated over 40 years.

The study, funded and overseen by the U.S. Public Health Service, started in 1932. In 1940, researchers elsewhere discovered that penicillin cured syphilis. By 1947, penicillin was “standard of care” for syphilis, meaning that patients with syphilis received penicillin as a matter of course, anywhere in the U.S.

But not in Tuskegee. Not in 1940. Not in 1947. Not until 1972, when a whistle-blower made the press aware of what was happening. In the meantime, many of the men died of syphilis, 40 of their wives contracted the disease, and 19 of their children were born with congenital syphilis. The men had never even been told the nature of the study, they were not informed in 1940 or 1947 that there was now a cure, and they were not offered that cure. Leaders of the U.S. Public Health Service were well aware that there was a cure for syphilis, but for various reasons, they did not stop the study. Not in 1940, not in 1947, not even when whistle-blowers told them what was going on. They stopped it only when the press found out.

blog_11-1-18_tuskegee_500x363

In 1997 a movie on the Tuskegee Syphilis Study was released. It was called Miss Evers’ Boys. Miss Evers (actually, Eunice Rivers) was the African-American public health nurse who was the main point of contact for the men over the whole 40 years. She deeply believed that she, and the study, were doing good for the men and their community, and she formed close relationships with them. She believed in the USPHS leadership, and thought they would never harm her “boys.”

The Tuskegee study was such a crime and scandal that it utterly changed procedures for medical research in the U.S. and most of the world. Today, participants in research with any level of risk, or their parents if they are children, must give informed consent for participation in research, and even if they are in a control group, they must receive at least “standard of care”: currently accepted, evidence-based practices.

If you’ve read my blogs, you’ll know where I’m going with this. Failure to use proven educational treatments, unlike medical ones, is rarely fatal, at least not in the short term. But otherwise, our profession carries out Tuskegee crimes all the time. It condemns failing students to ineffective programs and practices when effective ones are known. It fails to even inform parents or children, much less teachers and principals, that proven programs exist: Proven, practical, replicable solutions for the problems they face every day.

Like Miss Rivers, front-line educators care deeply about their charges. Most work very hard and give their absolute best to help all of their children to succeed. Teaching is too much hard work and too little money for anyone to do it for any reason but for the love of children.

But somewhere up the line, where the big decisions are made, where the people are who know or who should know which programs and practices are proven to work and which are not, this information just does not matter. There are exceptions, real heroes, but in general, educational leaders who believe that schools should use proven programs have to fight hard for this position. The problem is that the vast majority of educational expenditures—textbooks, software, professional development, and so on—lack even a shred of evidence. Not a scintilla. Some have evidence that they do not work. Yet advocates for those expenditures (such as sales reps and educators who like the programs) argue strenuously for programs with no evidence, and it’s just easier to go along. Whole states frequently adopt or require textbooks, software, and services of no known value in terms of improving student achievement. The ESSA evidence standards were intended to focus educators on evidence and incentivize use of proven programs, at least for the lowest-achieving 5% of schools in each state, but so far it’s been slow going.

Yet there are proven alternatives. Evidence for ESSA (www.evidenceforessa.org) lists more than 100 PK-12 reading and math programs that meet the top three ESSA evidence standards. The majority meet the top level, “Strong.” And most of the programs were researched with struggling students. Yet I am not perceiving a rush to find out about proven programs. I am hearing a lot of new interest in evidence, but my suspicion, growing every day, is that many educational leaders do not really care about the evidence, but are instead just trying to find a way to keep using the programs and providers they already have and already like, and are looking for evidence to justify keeping things as they are.

Every school has some number of struggling students. If these children are provided with the same approaches that have not worked with them or with millions like them, it is highly likely that most will fail, with all the consequences that flow from school failure: Retention. Assignment to special education. Frustration. Low expectations. Dropout. Limited futures. Poverty. Unemployment. There are 50 million children in grades PK to 12 in the U.S. This is the grinding reality for perhaps 10 to 20 million of them. Solutions are readily available, but not known or used by caring and skilled front-line educators.

In what way is this situation unlike Tuskegee in 1940?

 Photo credit: By National Archives Atlanta, GA (U.S. government) ([1], originally from National Archives) [Public domain], via Wikimedia Commons

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Response to Proven Instruction (RTPI)

Response to Intervention (RTI) is one of those great policy ideas caring policymakers always come up with that is carefully crafted and enthusiastically announced, then inconsistently implemented, evaluated at great cost, and found to have minimal impacts, if any.   In the case of RTI, the policy is genuinely sensible, but the 2015 MDRC evaluation (Balu et al., 2015) found that the implementation was poor and outcomes were nil, at least as measured in a much-criticized regression discontinuity design (see Fuchs & Fuchs, 2017).  An improvement on RTI, multi-tier systems of support (MTSS), adds in some good ideas, but I don’t think it will be enough.

The problem, I think, relates to something I wrote about at the time the MDRC study appeared. In fact, I gave the phenomenon a name: Bob’s Law, which states that any policy or intervention that is not well defined will not be well implemented and therefore will not work, no matter how sensible it may be. In the case of RTI/MTSS, everyone has a pretty good idea what “tier 1, tier 2, and tier 3” are in concept, but no one knows what they are actually composed of. So each district and school and teacher makes up their own strategies to do general teaching followed by remediation if needed, followed by intensive services if necessary. The problem is that since the actual programs provided in each tier are not specified, everyone will do pretty much what they would have done if RTI had not existed. And guess what?  If both RTI and non-RTI teachers are drawing from the same universally accepted basket of teaching methods, there is no reason to believe that outcomes will be better than ordinary practice if the RTI group is doing more or less the same thing as the non-RTI group.  This is not to say that standard methods are deficient, but why would we expect outcomes to differ if practices don’t?

Response to Proven Instruction (RTPI).

I recently wrote an article proposing a new approach to RTI/MTSS (Slavin, Inns, Pellegrini, & Lake, 2018).  The idea is simple. Why not insist that struggling learners receive tier 1, tier 2, and (if necessary) tier 3 services, each of which is proven to work in rigorous research?  In the article I listed numerous tier 2 and tier 3 services for reading and math that have all been successfully evaluated, with significant outcomes and effect sizes in excess of +0.20.  Every one of these programs involved tutoring, one to one or one to small group, by teachers or paraprofessionals. I also listed tier 1 services found to be very effective for struggling learners.  All of these programs are described at www.evidenceforessa.org.

blog 10 25 18 figure 1 2

If there are so many effective approaches for struggling learners, these should form the core of RTI/MTSS services. I would argue that tier 1 should be composed of proven whole class or whole school programs; tier 2, one-to-small group tutoring by well-qualified paraprofessionals using proven approaches; and tier 3, one-to-one tutoring by paraprofessionals or teachers using proven approaches (see Figure 1).

The result would have to be substantial improvements in the achievement of struggling learners, and reductions in special education and retentions.  These outcomes are assured, as long as implementation is strong, because the programs themselves are proven to work.  Over time, better and more cost-effective programs would be sure to appear, but we could surely do a lot better today with the programs we have now.

Millions of children live in the cruel borderlands between low reading groups and special education. These students are perfectly normal, except from 9:00 to 3:00 on school days. They start school with enthusiasm, but then slide over the years into failure, despair, and then dropout or delinquency.  If we have proven approaches and can use them in a coherent system to ensure success for all of these children, why would we not use them?

Children have a right to have every chance to succeed.  We have a moral imperative to see that they receive what they need, whatever it takes.

References

Balu, R., Zhu, P., Doolittle, F., Schiller, E., Jenkins, J., & Gersten, R. (2015). Evaluation of response to intervention practices for elementary school reading. Washington, DC: U.S. Department of Education, Institute for Education Sciences, NCEE 2016-4000.

Fuchs, D., & Fuchs, L.S. (2017). Critique of the National Evaluation of Response to Intervention: A case for simpler frameworks. Exceptional Children, 83 (3), 1-14.

Slavin, R.E., Inns, A., Pellegrini, M., & Lake, C. (2018). Response to proven instruction (RTPI): Enabling struggling learners. Manuscript submitted for publication.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.