Building Back Better

Yesterday, President Joe Biden took his oath of office. He is taking office at one of the lowest points in all of American history. Every American, whatever their political beliefs, should be wishing him well, because his success is essential for the recovery of our nation.

In education, most schools remain closed or partially open, and students are struggling with remote learning. My oldest granddaughter is in kindergarten. Every school day, she receives instruction from a teacher she has never met. She has never seen the inside of “her school.” She is lucky, of course, because she has educators as grandparents (us), but it is easy to imagine the millions of kindergartners who do not even have access to computers, or do not have help in learning to read and learning mathematics. These children will enter first grade with very little of the background they need, in language and school skills as well as in content.

Of course, the problem is not just kindergarten. All students have missed a lot of school, and they will vary widely in their experiences during that time. Think of second graders who essentially missed first grade. Students who missed the year when they are taught biology. Students who missed the fundamentals of creative writing. Students who should be in Algebra 2, except that they missed Algebra 1.

Hopefully, providing vaccines as quickly as possible to school staffs will enable most schools to open this spring. But we have a long, long way to go to get back to normal, especially with disadvantaged students. We cannot just ask students on their first day back to open their math books to the page they were on in March, 2020, when school closed.

Students need to be assessed when they return, and if they are far behind in reading or math, given daily tutoring, one-to-one or one-to-small group. If you follow this blog, you’ve heard me carry on at length about this.

Tutoring services, using tutoring programs proven to be effective, will be of enormous help to students who are far behind grade level (here, here, here). But the recovery from Covid-19 school closures should not be limited to repairing the losses. Instead, I hope the Covid-19 crisis can be an opportunity to reconsider how to rebuild our school system to enhance the school success of all students.

If we are honest with ourselves, we know that schooling in America was ailing long before Covid-19. It wasn’t doing so badly for middle class children, but it was failing disadvantaged students. These very same students have suffered disproportionately from Covid-19. So in the process of bringing these children back into school, let’s not stop with getting back to normal. Let’s figure out how to create schools that use the knowledge we have gained over the past 20 years, and knowledge we can develop in the coming years, to transform learning for our most vulnerable children.

Building Back Better

Obviously, the first thing we have to do this spring is reopen schools and make them as healthy, happy, welcoming, and upbeat as possible. We need to make sure that schools are fully staffed and fully equipped. We do need to “build back” before we can “build back better.” But we cannot stop there. Below, I discuss several things that would greatly transform education for disadvantaged students.

1.  Tutoring

Yes, tutoring is the first thing we have to do to build better. Every child who is significantly below grade level needs daily one-to-small group or one-to-one tutoring, until they reach a pre-established level of performance, depending on grade level, in reading and math.

However, I am not talking about just any tutoring. Not all tutoring works. But there are many programs that have been proven to work, many times. These are the tutoring programs we need to start with as soon as possible, with adequate training resources to ensure student success.

Implementing proven tutoring programs on a massive scale is an excellent “build back” strategy, the most effective and cost-effective strategy we have. However, tutoring should also be the basis for a key “build better” strategy

2.  Establishing success as a birthright and ensuring it using proven programs of all kinds.

We need to establish adequate reading and mathematics achievement as the birthright of every child. We can debate about what that level might be, but we must hold ourselves accountable for the success of every child. And we need to accomplish this not just by using accountability assessments and hoping for the best, but by providing proven programs to all students who need them for as long as they need them.

As I’ve pointed out in many blogs (here, here, here), we now have many programs proven effective in rigorous experiments and known to improve student achievement (see www.evidenceforessa.org). Every child who is performing below level, and every school serving many children below grade level, should have resources and knowledge to adopt proven programs. Teachers and tutors need to be guaranteed sufficient professional development and in-class coaching to enable them to successfully implement proven programs. Years ago, we did not have sufficient proven programs, so policy makers kept coming up with evidence-free policies, which have just not worked as intended. But now, we have many programs ready for widespread dissemination. To build better, we have to use these tools, not return to near universal use of instructional strategies, materials, and technology that have never been successfully evaluated. Instead, we need to use what works, and to facilitate adoption and effective implementation of proven programs.

3.  Invest in development and evaluation of promising programs.

How is it that in a remarkably short time, scientists were able to develop vaccines for Covid-19, vaccines that promise to save millions of lives? Simple. We invested billions in research, development, and evaluations of alternative vaccines. Effective vaccines are very difficult to make, and the great majority failed.  But at this writing, two U.S. vaccines have succeeded, and this is a mighty good start. Now, government is investing massively in rigorous dissemination of these vaccines.

Total spending on all of education research dedicated to creating and evaluating educational innovations is a tiny fraction of what has been spent and will be spent on vaccines. But can you imagine that it is impossible to improve reading, math, science, and other outcomes, with clear goals and serious resources? Of course it could be done. A key element of “building better” could be to substantially scale up use of proven programs we have now, and to invest in new development and evaluation to make today’s best obsolete, replaced by better and better approaches. The research and evaluation of tutoring proves this could happen, and perhaps a successful rollout of tutoring will demonstrate what proven programs can do in education.

4.  Commit to Success

Education goes from fad to fad, mandate to mandate, without making much progress. In order to “build better,” we all need to commit to finding what works, disseminating it broadly, and then finding even better solutions, until all children are succeeding. This must be a long-term commitment, but if we are investing adequately and see that we are improving outcomes each year, then it is clear we can do it.            

With a change of administrations, we are going to hear a lot about hope. Hope is a good start, but it is not a plan. Let’s plan to build back better, and then for the first time in the history of education, make sure our solutions work, for all of our children.

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

Tutoring Could Change Everything

Starting in the 1990s, futurists and technology fans began to say, “The Internet changes everything.” And eventually, it did. The Internet has certainly changed education, although it is unclear whether these changes have improved educational effectiveness.

Unlike the Internet, tutoring has been around since hunters and gatherers taught their children to hunt and gather. Yet ancient as it is, making one-to-one or small group tutoring widely available in Title I schools could have profound impacts on the most nettlesome problems of education.

            If the National Tutoring Corps proposal I’ve been discussing in recent blogs (here , here, and here) is widely implemented and successful, it could have both obvious and not-so-obvious impacts on many critical aspects of educational policy and practice. In this blog, I’ll discuss these revolutionary and far-reaching impacts.

Direct and Most Likely Impacts

Struggling Students

            Most obviously, if the National Tutoring Corps is successful, it will be because it has had an important positive impact on the achievement of students who are struggling in reading and/or mathematics. At 100,000 tutors, we expect as many as four million low-achieving students in Title I schools will benefit, about 10% of all U.S. students in grades 1-9, but, say, 50% of the students in the lowest 20% of their grades.

Title I

            In a December 20 tweet, former Houston superintendent Terry Grier suggested: “Schools should utilize all or most of their Title I money to implement tutoring programs…to help K-2 students catch up on lost literacy skills.”

            I’d agree, except that I’d include later grades and math as well as reading if there is sufficient funding. The purpose of Title I is to accelerate the achievement of low-achieving, disadvantaged students. If schools were experienced with implementing proven tutoring programs, and knew them from their own experience to be effective and feasible, why would such programs not become the main focus of Title I funding, as Grier suggests?

Special Education

            Students with specific learning disabilities and other “high-incidence” disabilities (about half of all students in special education) are likely to benefit from structured tutoring in reading or math. If we had proven, reliable, replicable tutoring models, with which many schools will have had experience, then schools might be able to greatly reduce the need for special education for students whose only problem is difficulty in learning reading or mathematics. For students already in special education, their special education teachers may adopt proven tutoring methods themselves, and may enable students with specific learning disabilities to succeed in reading and math, and hopefully to exit special education.

Increasing the Effectiveness of Other Tutoring and Supportive Services

            Schools already have various tutoring programs, including volunteer programs. In schools involved in the National Tutoring Corps, we recommend that tutoring by paid, well-trained tutors go to the lowest achievers in each grade. If schools also have other tutoring resources, they should be concentrated on students who are below grade level, but not struggling as much as the lowest achievers. These additional tutors might use the proven effective programs provided by the National Tutoring Corps, offering a consistent and effective approach to all students who need tutoring. The same might apply to other supportive services offered by the school.

Less Obvious But Critical Impacts

A Model for Evidence-to-Practice

            The success of evidence-based tutoring could contribute to the growth of evidence-based reform more broadly. If the National Tutoring Corps is seen to be effective because of its use of already-proven instructional approaches, this same idea could be used in every part of education in which robust evidence exists. For example, education leaders might reason that if use of evidence-based tutoring approaches had a big effect on students struggling in reading and math, perhaps similar outcomes could be achieved in algebra, or creative writing, or science, or programs for English learners.

Increasing the Amount and Quality of Development and Research on Replicable Solutions to Key Problems in Education

            If the widespread application of proven tutoring models broadly improves student outcomes, then it seems likely that government, private foundations, and perhaps creators of educational materials and software might invest far more in development and research than they do now, to discover new, more effective educational programs.

Reductions in Achievement Gaps

            If it were widely accepted that there were proven and practical means of significantly improving the achievement of low achievers, then there is no excuse for allowing achievement gaps to continue. Any student performing below the mean could be given proven tutoring and should gain in achievement, reducing gaps between low and high achievers.

Improvements in Behavior and Attendance

            Many of the students who engage in disruptive behavior are those who struggle academically, and therefore see little value in appropriate behavior. The same is true of students who skip school. Tutoring may help prevent behavior and attendance problems, not just by increasing the achievement of struggling students, but also by giving them caring, personalized teaching with a tutor who forms positive relationships with them and encourages attendance and good behavior.

Enhancing the Learning Environment for Students Who Do Not Need Tutoring

            It is likely that a highly successful tutoring initiative for struggling students could enhance the learning environment for the schoolmates of these students who do not need tutoring. This would happen if the tutored students were better behaved and more at peace with themselves, and if teachers did not have to struggle to accommodate a great deal of diversity in achievement levels within each class.

            Of course, all of these predictions depend on Congress funding a national tutoring plan based on the use of proven programs, and on implementation at scale actually producing the positive impacts that they have so often shown in research. But I hope these predictions will help policy makers and educational leaders realize the potential positive impacts a tutoring initiative could have, and then do what they can to make sure that the tutoring programs are effectively implemented and produce their desired impact. Then, and only then, will tutoring truly change everything.

Clarification:

Last week’s blog, on the affordability of tutoring, stated that a study of Saga Math, in which there was a per-pupil cost of $3,600, was intended as a demonstration, and was not intended to be broadly replicable.  However, all I meant to say is that Saga was never intended to be replicated AT THAT PRICE PER STUDENT.  In fact, a much lower-cost version of Saga Math is currently being replicated.  I apologize if I caused any confusion.

Photo credit: Deeper Learning 4 All, (CC BY-NC 4.0)

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

Large-Scale Tutoring Could Fail. Here’s How to Ensure It Does Not.

I’m delighted to see that the idea of large-scale tutoring to combat Covid-19 losses has gotten so important in the policy world that it is attracting scoffers and doubters. Michael Goldstein and Bowen Paulle (2020) published five brief commentaries recently in The Gadfly, warning about how tutoring could fail, both questioning the underlying research on tutoring outcomes (maybe just publication bias?) and noting the difficulties of rapid scale up. They also quote without citation a comment by Andy Rotherham, who quite correctly notes past disasters when government has tried and failed to scale up promising strategies: “Ed tech, class size reduction, teacher evaluations, some reading initiatives, and charter schools.” To these, I would add many others, but perhaps most importantly Supplementary Educational Services (SES), a massive attempt to implement all sorts of after school and summer school programs in high-poverty, low-achieving schools, which had near-zero impact, on average.

So if you were feeling complacent that the next hot thing, tutoring, was sure to work, no matter how it’s done, then you have not been paying attention for the past 30 years.

But rather than argue with these observations, I’d like to explain that the plan I’ve proposed, which you will find here, is fundamentally different from any of these past efforts, and if implemented as designed, with adequate funding, is highly likely to work at scale.

1.  Unlike all of the initiatives Rotherham dismisses, unlike SES, unlike just about everything ever used at scale in educational policy, the evidence base for certain specific, well-evaluated programs is solid.  And in our plan, only the proven programs would be scaled.

A little known but crucial fact: Not all tutoring programs work. The details matter. Our recent reviews of research on programs for struggling readers (Neitzel et al., in press) and math (Pellegrini et al., in press) identify individual tutoring programs that do and do not work, as well as types of tutoring that work well and those that do not.

Our scale-up plan would begin with programs that already have solid evidence of effectiveness, but it would also provide funding and third-party, rigorous evaluations of scaled-up programs without sufficient evidence, as well as new programs, designed to add additional options for schools. New and insufficiently evaluated programs would be piloted and implemented for evaluation, but they would not be scaled up unless they have solid evidence of effectiveness in randomized evaluations.

If possible, in fact, we would hope to re-evaluate even the most successful evaluated programs, to make sure they work.

If we stick to repeatedly-proven programs, rigorously evaluated in large randomized experiments, then who cares whether other programs have failed in the past? We will know that the programs being used at scale do work. Also, all this research would add greatly to knowledge about effective and ineffective program components and applications to particular groups of students, so over time, we’d expect the individual programs, and the field as a whole, to gain in the ability to provide proven tutoring approaches at scale.

2.  Scale-up of proven programs can work if we take it seriously. It is true that scale-up has many pitfalls, but I would argue that when scale-up does not occur it is for one of two reasons. First, the programs being scaled were not adequately proven in the first place. Second, the funding provided for scale-up was not sufficient to allow the program developers to scale up under the conditions they know full well are necessary. As examples of the latter, programs that provided well-trained and experienced trainers in their initial studies are often forced by insufficient funding to use trainer-of-trainers models for greatly diminished amounts of training in scale-up. As a result, the programs that worked at small scale failed in large-scale replication. This happens all the time, and this is what makes policy experts conclude that nothing works at scale.

However, the lesson they should have learned instead is just that programs proven to work at small scale can succeed if the key factors that made them work at small scale are implemented with fidelity at large scale. If anything less is done in scale-up, you’re taking big risks.

If well-trained trainers are essential, then it is critical to insist on well-trained trainers. If a certain amount or quality of training is essential, it is critical to insist on it, and make sure it happens in every school using a given program. And so on. There is no reason to skimp on the proven recipe.

But aren’t all these trainers and training days and other elements unsustainable?  This is the wrong question. The right one is, how can we make tutoring as effective as possible, to justify its cost?

Tutoring is expensive, but most of the cost is in the salaries of the tutors themselves. As an analogy, consider horse racing.  Horse owners pay millions for horses with great potential. Having done so, do you think they skimp on trainers or training? Of course not. In the same way, a hundred teaching assistants tutors cost roughly $4 million per year in salaries and benefits alone. Let’s say top-quality training for this group costs $500,000 per year, while crummy training costs $50,000. If these figures are in the ballpark, would it be wise to spend $4,500,000 on a terrific tutoring program, or $4,050,000 on a crummy one?

Successful scale-up takes place all the time in business. How does Starbucks make sure your experience in every single store is excellent? Simple. They have well-researched, well specified, obsessively monitored standards and quality metrics for every part of their operation. Scale-up in education can work just the same way, and in comparison to the costs of front-line personnel, the costs of great are trivially greater than the cost of crummy.

3.  Ongoing research will, in our proposal, formatively evaluate the entire tutoring effort over time, and development and evaluation will continually add new proven programs.  

Ordinarily, big federal education programs start with all kinds of rules and regulations and funding schemes, and these are announced with a lot of hoopla and local and national meetings to explain the new programs to local educators and leaders. Some sort of monitoring and compliance mechanism is put in place, but otherwise the program steams ahead. Several years later, some big research firm gets a huge contract to evaluate the program. On average, the result is almost always disappointing. Then there’s a political fight about just how disappointing the results are, and life goes on.

 The program we have proposed is completely different. First, as noted earlier, the individual programs that are operating at large scale will all be proven effective to begin with, and may be evaluated and proven effective again, using the same methods as those used to validate new programs. Second, new proven programs would be identified and scaled up all the time. Third, numerous studies combining observations, correlational studies, and mini-experiments would be evaluating program variations and impacts with different populations and circumstances, adding knowledge of what is happening at the chalkface and of how and why outcomes vary. This explanatory research would not be designed to decide which programs work and which do not (that would be done in the big randomized studies), but to learn from practice how to improve outcomes for each type of school and application. The idea is to get smarter over time about how to make tutoring as effective as it can be, so when the huge summative evaluation takes place, there will be no surprises. We would already know what is working, and how, and why.

Our National Tutoring Corps proposal is not a big research project, or a jobs program for researchers. The overwhelming focus is on providing struggling students the best tutoring we know how to provide. But using a small proportion of the total allocation would enable us to find out what works, rapidly enough to inform practice. If this were all to happen, we would know more and be able to do more every year, serving more and more struggling students with better and better programs.

So rather than spending a lot of taxpayer money and hoping for the best, we’d make scale-up successful by using evidence at the beginning, middle, and end of the process, to make sure that this time, we really know what we are doing. We would make sure that effective programs remain successful at scale, rather than merely hoping they will.

References

Goldstein, M., & Paulle, B. (2020, Dec. 8) Vaccine-making’s lessons for high-dosage tutoring, Part 1. The Gadfly.

Goldstein, M., & Paulle, B. (2020, Dec. 11). Vaccine-making’s lessons for high-dosage tutoring, Part IV. The Gadfly.

Neitzel, A., Lake, C., Pellegrini, M., & Slavin, R. (in press). A synthesis of quantitative research on programs for struggling readers in elementary schools. Reading Research Quarterly.

Pellegrini, M., Neitzel, A., Lake, C., & Slavin, R. (in press). Effective programs in elementary mathematics: A best-evidence synthesis. AERA Open.

Original photo by Catherine Carusso, Presidio of Monterey Public Affairs

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

The Details Matter. That’s Why Proven Tutoring Programs Work Better than General Guidelines.

When I was in first grade, my beloved teacher, Mrs. Adelson, introduced a new activity. She called it “phonics.”  In “phonics,” we were given tiny pieces of paper with letters on them to paste onto a piece of paper, to make words. It was a nightmare. Being a boy, I could sooner sprout wings and fly than do this activity without smearing paste and ink all over the place. The little slips of paper stuck to my thumb rather than to the paper. This activity taught me no phonics or reading whatsoever, but did engender a longtime hatred of “phonics,” as I understood it.

Much, much later I learned that phonics was essential in beginning reading, so I got over my phonics phobia. And I learned an important lesson. Even if an activity focuses on an essential skill, this does not mean that just any activity with that focus will work. The details matter.

I’ve had reason to reflect on this early lesson many times recently, as I’ve spoken to various audiences about our National Tutoring Corps plan. Often, people will ask why it is important to use specific proven programs. Why not figure out the characteristics of proven programs, and encourage tutors to use those consensus strategies?

The answer is that because the details matter, tutoring according to agreed-upon practices is not going to be as effective as specific proven programs, on average. Mrs. Adelson had a correct understanding of the importance of phonics in beginning reading, but in the classroom, where the paste hits the page, her phonics strategy was awful. In tutoring, we might come to agreement about factors such as group size, qualifications of tutors, amount of PD, and so on, but dozens of details also have to be right. An effective tutoring program has to get right crucial features, such as the nature and quality of tutor training and coaching, student materials and software, instructional strategies, feedback and correction strategies when students make errors, frequency and nature of assessments, means of motivating and recognizing student progress, means of handling student absences, links between tutors and teachers and between tutors and parents, and much more. Getting any of these strategies wrong could greatly diminish the effectiveness of tutoring.

The fact that a proven program has shown positive outcomes in rigorous experiments supports confidence that the program’s particular constellation of strategies is effective. During any program’s development and piloting, developers have had to experiment with solutions to each of the key elements. They have had many opportunities to observe tutoring sessions, to speak with tutors, to look at formative data, and to decide on specific strategies for each of the problems that must be solved. A teacher or local professional developer has not had the opportunity to try out and evaluate specific components, so even if they have an excellent understanding of the main elements of tutoring, they could use or promote key components that are not effective or may even be counterproductive. There are now many practical, ready-to-implement, rigorously evaluated tutoring programs with positive impacts (Neitzel et al., in press). Why should we be using programs whose effects are unknown, when there are many proven alternatives?

Specificity is of particular importance in small-group tutoring, because very effective small group methods superficially resemble much less effective methods (see Borman et al., 2001; Neitzel et al., in press; Pellegrini et al., 2020). For example, one-to-four tutoring might look like traditional Title I pullouts, which are far less effective. Some “tutors” teach a class of four no differently than they would teach a class of thirty. Tutoring methods that incorporate computers may also superficially resemble computer assisted instruction, which is also far less effective. Tutoring derives its unique effectiveness from the ability of the tutor to personalize instruction for each child, to provide unique feedback to the specific problems each student faces. It also depends on close relationships between tutors and students. If the specifics are not carefully trained and implemented with understanding and spirit, small-group tutoring can descend into business-as-usual. Not that ordinary teaching and CAI are ineffective, but to successfully combat the effects of Covid-19 school closures and learning gaps in general, tutoring must be much more effective than similar-looking methods. And it can be, but only if tutors are trained and equipped to provide tutoring that has been proven to be effective.

Individual tutors can and do adapt tutoring strategies to meet the needs of particular students or subgroups, and this is fine if the tutor is starting from a well-specified and proven, comprehensive tutoring program and making modifications for well-justified reasons. But when tutors are expected to substantially invent or interpret general strategies, they may make changes that diminish program effectiveness. All too often, local educators seek to modify proven programs to make them easier to implement, less expensive, or more appealing to various stakeholders, but these modifications may leave out elements essential to program effectiveness.

The national experience of Supplementary Educational Services illustrates how good ideas without an evidence base can go wrong. SES provided mostly after-school programs of all sorts, including various forms of tutoring. But hardly any of these programs had evidence of effectiveness. A review of outcomes of almost 400 local SES grants found reading and math effect sizes near zero, on average (Chappell et al., 2011).

In tutoring, it is essential that every student receiving tutoring gets a program highly likely to measurably improve the student’s reading or mathematics skills. Tutoring is expensive, and tutoring is mostly used with students who are very much at risk. It is critical that we give every tutor and every student the highest possible probability of life-altering improvement. Proven, replicable, well-specified programs are the best way to ensure positive outcomes.

Mrs. Adelson was right about phonics, but wrong about how to teach it. Let’s not make the same mistake with tutoring.

References

Borman, G., Stringfield, S., & Slavin, R.E. (Eds.) (2001).  Title I: Compensatory education at the crossroads.  Mahwah, NJ: Erlbaum.

Chappell, S., Nunnery, J., Pribesh, S., & Hager, J. (2011). A meta-analysis of Supplemental Educational Services (SES) provider effects on student achievement. Journal of Education for Students Placed at Risk, 16(1), 1-23.

Neitzel, A., Lake, C., Pellegrini, M., & Slavin, R. (in press). A synthesis of quantitative research on programs for struggling readers in elementary schools. Reading Research Quarterly.

Pellegrini, M., Neitzel, A., Lake, C., & Slavin, R. (2020). Effective programs in elementary mathematics: A best-evidence synthesis. Available at www.bestevidence.com. Manuscript submitted for publication.

Photo by Austrian National Library on Unsplash

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

How to Make Evidence in Education Make a Difference

By Robert Slavin

I have a vision of how education in the U.S. and the world will begin to make solid, irreversible progress in student achievement. In this vision, school leaders will constantly be looking for the most effective programs, proven in rigorous research to accelerate student achievement. This process of informed selection will be aided by government, which will provide special incentive funds to help schools implement proven programs.

In this imagined future, the fact that schools are selecting programs based on good evidence means that publishers, software companies, professional development companies, researchers, and program developers, as well as government, will be engaged in a constant process of creating, evaluating, and disseminating new approaches to every subject and grade level. As in medicine, developers and researchers will be held to strict standards of evidence, but if they develop programs that meet these high standards, they can be confident that their programs will be widely adopted, and will truly make a difference in student learning.

Discovering and disseminating effective classroom programs is not all we have to get right in education. For example, we also need great teachers, principals, and other staff who are well prepared and effectively deployed. A focus on evidence could help at every step of that process, of course, but improving programs and improving staff are not an either-or proposition. We can and must do both. If medicine, for example, focused only on getting the best doctors, nurses, technicians, other staff, but medical research and dissemination of proven therapies were underfunded and little heeded, then we’d have great staff prescribing ineffective or possibly harmful medicines and procedures. In agriculture, we could try to attract farmers who are outstanding in their fields, but that would not have created the agricultural revolution that has largely solved the problem of hunger in most parts of the world. Instead, decades of research created or identified improvements in seeds, stock, fertilizers, veterinary practices, farming methods, and so on, for all of those outstanding farmers to put into practice.

Back to education, my vision of evidence-based reform depends on many actions. Because of the central role government plays in public education, government must take the lead. Some of this will cost money, but it would be a tiny proportion of the roughly $600 billion we spend on K-12 education annually, at all levels (federal, state, and local). Other actions would cost little or nothing, focusing only on standards for how existing funds are used. Key actions to establish evidence of impact as central to educational decisions are as follows:

  1. Invest substantially in practical, replicable approaches to improving outcomes for students, especially achievement outcomes.

Rigorous, high-quality evidence of effectiveness for educational programs has been appearing since about 2006 at a faster rate than ever before, due in particular to investments by the Institute for Education Sciences (IES), Investing in Innovation/Education Innovation Research (i3/EIR), and the National Science Foundation (NSF) in the U.S., and the Education Endowment Foundation in England, but also other parts of government and private foundations. All have embraced rigorous evaluations involving random assignment to conditions, appropriate measures independent of developers or researchers, and at the higher funding levels, third-party evaluators. These are very important developments, and they have given the research field, educators, and policy makers excellent reasons for confidence that the findings of such research have direct meaning for practice. One problem is that, as is true in every applied field that embraces rigorous research, most experiments do not find positive impacts. Only about 20% of such experiments do find positive outcomes. The solution to this is to learn from successes and failures, so that our success rate improves over time. We also need to support a much larger enterprise of development of new solutions to enduring problems of education, in all subjects and grade levels, and to continue to support rigorous evaluations of the most promising of these innovations. In other words, we should not be daunted by the fact that most evaluations do not find positive impacts, but instead we need to increase the success rate by learning from our own evidence, and to carry out many more experiments. Even 20% of a very big number is a big number.

2. Improve communications of research findings to researchers, educators, policy makers, and the general public.

Evidence will not make a substantial difference in education until key stakeholders see it as a key to improving students’ success. Improving communications certainly includes making it easy for various audiences to find out which programs and practices are truly effective. But we also need to build excitement about evidence. To do this, government might establish large-scale, widely publicized, certain-to-work demonstrations of the use and outcomes of proven approaches, so that all will see how evidence can lead to meaningful change.

I will be writing more on in depth on this topic in future blogs.

3. Set specific standards of evidence, and provide incentive funding for schools to adopt and implement proven practices.

The Every Student Succeeds Act (ESSA) boldly defined “strong,” “moderate,” “promising,” and lower levels of evidence of effectiveness for educational programs, and required use of programs meeting one of these top categories for certain federal funding, especially school improvement funding for low-achieving schools. This certainly increased educators’ interest in evidence, but in practice, it is unclear how much this changed practice or outcomes. These standards need to be made more specific. In addition, the standards need to be applied to funding that is clearly discretionary, to help schools adopt new programs, not to add new evidence requirements to traditional funding sources. The ESSA evidence standards have had less impact than hoped for because they mainly apply to school improvement, a longstanding source of federal funding. As a result, many districts and states have fought hard to have the programs they already have declared “effective,” regardless of their actual evidence base. To make evidence popular, it is important to make proven programs available as something extra, a gift to schools and children rather than a hurdle to continuing existing programs. In coming blogs I’ll write further about how government could greatly accelerate and intensify the process of development, evaluation, communication, and dissemination, so that the entire process can begin to make undeniable improvements in particular areas of critical importance demonstrating how evidence can make a difference for students.

Photo credit: Deeper Learning 4 All/(CC BY-NC 4.0)

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

How Can You Tell When The Findings of a Meta-Analysis Are Likely to Be Valid?

In Baltimore, Faidley’s, founded in 1886, is a much loved seafood market inside Lexington Market. Faidley’s used to be a real old-fashioned market, with sawdust on the floor and an oyster bar in the center. People lined up behind their favorite oyster shucker. In a longstanding tradition, the oyster shuckers picked oysters out of crushed ice and tapped them with their oyster knives. If they sounded full, they opened them. But if they did not, the shuckers discarded them.

I always noticed that the line was longer behind the shucker who was discarding the most oysters. Why? Because everyone knew that the shucker who was pickier was more likely to come up with a dozen fat, delicious oysters, instead of say, nine great ones and three…not so great.

I bring this up today to tell you how to pick full, fair meta-analyses on educational programs. No, you can’t tap them with an oyster knife, but otherwise, the process is similar. You want meta-analysts who are picky about what goes into their meta-analyses. Your goal is to make sure that a meta-analysis produces results that truly represent what teachers and schools are likely to see in practice when they thoughtfully implement an innovative program. If instead you pick the meta-analysis with the biggest effect sizes, you will always be disappointed.

As a special service to my readers, I’m going to let you in on a few trade secrets about how to quickly evaluate a meta-analysis in education.

One very easy way to evaluate a meta-analysis is to look at the overall effect size, probably shown in the abstract. If the overall mean effect size is more than about +0.40, you probably don’t have to read any further. Unless the treatment is tutoring or some other treatment that you would expect to make a massive difference in student achievement, it is rare to find a single legitimate study with an effect size that large, much less an average that large. A very large effect size is almost a guarantee that a meta-analysis is full of studies with design features that greatly inflate effect sizes, not studies with outstandingly effective treatments.

Next, go to the Methods section, which will have within it a section on inclusion (or selection) criteria. It should list the types of studies that were or were not accepted into the study. Some of the criteria will have to do with the focus of the meta-analysis, specifying, for example, “studies of science programs for students in grades 6 to 12.” But your focus is on the criteria that specify how picky the meta-analysis is. As one example of a picky set of critera, here are the main ones we use in Evidence for ESSA and in every analysis we write:

  1. Studies had to use random assignment or matching to assign students to experimental or control groups, with schools and students in each specified in advance.
  2. Students assigned to the experimental group had to be compared to very similar students in a control group, which uses business-as-usual. The experimental and control students must be well matched, within a quarter standard deviation at pretest (ES=+0.25), and attrition (loss of subjects) must be no more than 15% higher in one group than the other at the end of the study. Why? It is essential that experimental and control groups start and remain the same in all ways other than the treatment. Controls for initial differences do not work well when the differences are large.
  3. There must be at least 30 experimental and 30 control students. Analyses of combined effect sizes must control for sample sizes. Why? Evidence finds substantial inflation of effect sizes in very small studies.
  4. The treatments must be provided for at least 12 weeks. Why? Evidence finds major inflation of effect sizes in very brief studies, and brief studies do not represent the reality of the classroom.
  5. Outcome measures must be measures independent of the program developers and researchers. Usually, this means using national tests of achievement, though not necessarily standardized tests. Why? Research has found that tests made by researchers can inflate effect sizes by double, or more, and research-made measures do not represent the reality of classroom assessment.

There may be other details, but these are the most important. Note that there is a double focus of these standards. Each is intended both to minimize bias, but also to maximize similarity to the conditions faced by schools. What principal or teacher who cares about evidence would be interested in adopting a program evaluated in comparison to a very different control group? Or in a study with few subjects, or a very brief duration? Or in a study that used measures made by the developers or researchers? This set is very similar to what the What Works Clearinghouse (WWC) requires, except #5 (the WWC requires exclusion of “overaligned” measures, but not developer-/researcher-made measures).

If these criteria are all there in the “Inclusion Standards,” chances are you are looking at a top-quality meta-analysis. As a rule, it will have average effect sizes lower than those you’ll see in reviews without some or all of these standards, but the effect sizes you see will probably be close to what you will actually get in student achievement gains if your school implements a given program with fidelity and thoughtfulness.

What I find astonishing is how many meta-analyses do not have standards this high. Among experts, these criteria are not controversial, except for the last one, which shouldn’t be. Yet meta-analyses are often written, and accepted by journals, with much lower standards, thereby producing greatly inflated, unrealistic effect sizes.

As one example, there was a meta-analysis of Direct Instruction programs in reading, mathematics, and language, published in the Review of Educational Research (Stockard et al., 2016). I have great respect for Direct Instruction, which has been doing good work for many years. But this meta-analysis was very disturbing.

The inclusion and exclusion criteria in this meta-analysis did not require experimental-control comparisons, did not require well-matched samples, and did not require any minimum sample size or duration. It was not clear how many of the outcomes measures were made by program developers or researchers, rather than independent of the program.

With these minimal inclusion standards, and a very long time span (back to 1966), it is not surprising that the review found a great many qualifying studies. 528, to be exact. The review also reported extraordinary effect sizes: +0.51 for reading, +0.55 for math, and +0.54 for language. If these effects were all true and meaningful, it would mean that DI is much more effective than one-to-one tutoring, for example.

But don’t get your hopes up. The article included an online appendix that showed the sample sizes, study designs, and outcomes of every study.

First, the authors identified eight experimental designs (plus single-subject designs, which were treated separately). Only two of these would meet anyone’s modern standards of meta-analysis: randomized and matched. The others included pre-post gains (no control group), comparisons to test norms, and other pre-scientific designs.

Sample sizes were often extremely small. Leaving aside single-case experiments, there were dozens of single-digit sample sizes (e.g., six students), often with very large effect sizes. Further, there was no indication of study duration.

What is truly astonishing is that RER accepted this study. RER is the top-rated journal in all of education, based on its citation count. Yet this review, and the Kulik & Fletcher (2016) review I cited in a recent blog, clearly did not meet minimal standards for meta-analyses.

My colleagues and I will be working in the coming months to better understand what has gone wrong with meta-analysis in education, and to propose solutions. Of course, our first step will be to spend a lot of time at oyster bars studying how they set such high standards. Oysters and beer will definitely be involved!

Photo credit: Annette White / CC BY-SA (https://creativecommons.org/licenses/by-sa/4.0)

References

Kulik, J. A., & Fletcher, J. D. (2016). Effectiveness of intelligent tutoring systems: a meta-analytic review. Review of Educational Research, 86(1), 42-78.

Stockard, J., Wood, T. W., Coughlin, C., & Rasplica Khoury, C. (2018). The effectiveness of Direct Instruction curricula: A meta-analysis of a half century of research. Review of Educational Research88(4), 479–507. https://doi.org/10.3102/0034654317751919

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

Science of Reading: Can We Get Beyond Our 30-Year Pillar Fight?

How is it possible that the “reading wars” are back on? The reading wars primarily revolve around what are often called the five pillars of early reading: phonemic awareness, phonics, comprehension, vocabulary, and fluency. Actually, there is little debate about the importance of comprehension, vocabulary, or fluency, so the reading wars are mainly about phonemic awareness and phonics. Diehard anti-phonics advocates exist, but in all of educational research, there are few issues that have been more convincingly settled by high-quality evidence. The National Reading Panel (2000), the source of the five pillars, has been widely cited as conclusive evidence that success in the early stages of reading depends on ensuring that students are all successful in phonemic awareness, phonics, and the other pillars. I was invited to serve on that panel, but declined, because I thought it was redundant. Just a short time earlier, the National Research Council’s Committee on the Prevention of Reading Difficulties (Snow, Burns, & Griffin, 1998) had covered essentially the same ground and came to essentially the same conclusion, as had Marilyn Adams’ (1990) Beginning to Read, and many individual studies. To my knowledge, there is little credible evidence to the contrary. Certainly, then and now there have been many students who learn to read successfully with or without a focus on phonemic awareness and phonics. However, I do not think there are many students who could succeed with non-phonetic approaches but cannot learn to read with phonics-emphasis methods. In other words, there is little if any evidence that phonemic awareness or phonics cause harm, but a great deal of evidence that for perhaps more than half of students, effective instruction emphasizing phonemic awareness and phonics are essential.  Since it is impossible to know in advance which students will need phonics and which will not, it just makes sense to teach using methods likely to maximize the chances that all children (those who need phonics and those who would succeed with or without them) will succeed in reading.

However…

The importance of the five pillars of the National Reading Panel (NRP) catechism are not in doubt among people who believe in rigorous evidence, as far as I know. The reading wars ended in the 2000s and the five pillars won. However, this does not mean that knowing all about these pillars and the evidence behind them is sufficient to solve America’s reading problems. The NRP pillars describe essential elements of curriculum, but not of instruction.

blog_3-19-20_readinggroup_333x500Improving reading outcomes for all children requires the five pillars, but they are not enough. The five pillars could be extensively and accurately taught in every school of education, and this would surely help, but it would not solve the problem. State and district standards could emphasize the five pillars and this would help, but would not solve the problem. Reading textbooks, software, and professional development could emphasize the five pillars and this would help, but it would not solve the problem.

The reason that such necessary policies would still not be sufficient is that teaching effectiveness does not just depend on getting curriculum right. It also depends on the nature of instruction, classroom management, grouping, and other factors. Teaching reading without teaching phonics is surely harmful to large numbers of students, but teaching phonics does not guarantee success.

As one example, consider grouping. For a very long time, most reading teachers have used homogeneous reading groups. For example, the “Stars” might contain the highest-performing readers, the “Rockets” the middle readers, and the “Planets” the lowest readers. The teacher calls up groups one at a time. No problem there, but what are the students doing back at their desks? Mostly worksheets, on paper or computers. The problem is that if there are three groups, each student spends two thirds of reading class time doing, well, not much of value. Worse, the students are sitting for long periods of time, with not much to do, and the teacher is fully occupied elsewhere. Does anyone see the potential for idle hands to become the devil’s playground? The kids do.

There are alternatives to reading groups, such as the Joplin Plan (cross-grade grouping by reading level), forms of whole-class instruction, or forms of cooperative learning. These provide active teaching to all students all period. There is good evidence for these alternatives (Slavin, 1994, 2017). My main point is that a reading strategy that follows NRP guidelines 100% may still succeed or fail based on its grouping strategy. The same could be true of the use of proven classroom management strategies or motivational strategies during reading periods.

To make the point most strongly, imagine that a district’s teachers have all thoroughly mastered all five pillars of science of reading, which (we’ll assume) are strongly supported by their district and state. In an experiment, 40 teachers of grades 1 to 3 are selected, and 20 of these are chosen at random to receive sufficient tutors to work with their lowest-achieving 33% of students in groups of four, using a proven model based on science of reading principles. The other 20 schools just use their usual materials and methods, also emphasizing science of reading curricula and methods.

The evidence from many studies of tutoring (Inns et al., 2020), as well as common sense, tell us what would happen. The teachers supported by tutors would produce far greater achievement among their lowest readers than would the other equally science-of-reading-oriented teachers in the control group.

None of these examples diminish the importance of science of reading. But they illustrate that knowing science of reading is not enough.

At www.evidenceforessa.org, you can find 65 elementary reading programs of all kinds that meet high standards of effectiveness. Almost all of these use approaches that emphasize the five pillars. Yet Evidence for ESSA also lists many programs that equally emphasize the five pillars and yet have not found positive impacts. Rather than re-starting our thirty-year-old pillar fight, don’t you think we might move on to advocating programs that not only use the right curricula, but are also proven to get excellent results for kids?

References

Adams, M.J. (1990).  Beginning to read:  Thinking and learning about print.  Cambridge, MA:  MIT Press.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2020). A synthesis of quantitative research on programs for struggling readers in elementary schools. Available at www.bestevidence.org. Manuscript submitted for publication.

National Reading Panel (2000).  Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction.  Rockville, MD: National Institute of Child Health and Human Development.

Slavin, R. E. (1994). School and classroom organization in beginning reading:  Class size, aides, and instructional grouping. In R. E. Slavin, N. L. Karweit, and B. A. Wasik (Eds.), Preventing early school failure. Boston:  Allyn and Bacon.

Slavin, R. E. (2017). Instruction based on cooperative learning. In R. Mayer & P. Alexander (Eds.), Handbook of research on learning and instruction. New York: Routledge.

Snow, C.E., Burns, S.M., & Griffin, P. (Eds.) (1998).  Preventing reading difficulties in young children.  Washington, DC: National Academy Press.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

 

Getting Schools Excited About Participating in Research

If America’s school leaders are ever going to get excited about evidence, they need to participate in it. It’s not enough to just make school leaders aware of programs and practices. Instead, they need to serve as sites for experiments evaluating programs that they are eager to implement, or at least have friends or peers nearby who are doing so.

The U.S. Department of Education has funded quite a lot of research on attractive programs A lot of the studies they have funded have not shown positive impacts, but many have been found to be effective. Those effective programs could provide a means of engaging many schools in rigorous research, while at the same time serving as examples of how evidence can help schools improve their results.

Here is my proposal. It quite often happens that some part of the U.S. Department of Education wants to expand the use of proven programs on a given topic. For example, imagine that they wanted to expand use of proven reading programs for struggling readers in elementary schools, or proven mathematics programs in Title I middle schools.

Rather than putting out the usual request for proposals, the Department might announce that schools could qualify for funding to implement a qualifying proven program, but in order to participate they had to agree to participate in an evaluation of the program. They would have to identify two similar schools from a district, or from neighboring districts, that would agree to participate if their proposal is successful. One school in each pair would be assigned at random to use a given program in the first year or two, and the second school could start after the one- or two-year evaluation period was over. Schools would select from a list of proven programs and choose one that seems appropriate to their needs.

blog_2-6-20_celebrate_500x334            Many pairs of schools would be funded to use each proven program, so across all schools involved, this would create many large, randomized experiments. Independent evaluation groups would carry out the experiments. Students in participating schools would be pretested at the beginning of the evaluation period (one or two years), and posttested at the end, using tests independent of the developers or researchers.

There are many attractions to this plan. First, large randomized evaluations on promising programs could be carried out nationwide in real schools under normal conditions. Second, since the Department was going to fund expansion of promising programs anyway, the additional cost might be minimal, just the evaluation cost. Third, the experiment would provide a side-by-side comparison of many programs focusing on high-priority topics in very diverse locations. Fourth, the school leaders would have the opportunity to select the program they want, and would be motivated, presumably, to put energy into high-quality implementation. At the end of such a study, we would know a great deal about which programs really work in ordinary circumstances with many types of students and schools. But just as importantly, the many schools that participated would have had a positive experience, implementing a program they believe in and finding out in their own schools what outcomes the program can bring them. Their friends and peers would be envious and eager to get into the next study.

A few sets of studies of this kind could build a constituency of educators that might support the very idea of evidence. And this could transform the evidence movement, providing it with a national, enthusiastic audience for research.

Wouldn’t that be great?

 This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

New Sections on Social Emotional Learning and Attendance in Evidence for ESSA!

We are proud to announce the launch of two new sections of our Evidence for ESSA website (www.evidenceforessa.org): K-12 social-emotional learning and attendance. Funded by a grant from the Bill and Melinda Gates Foundation, the new sections represent our first foray beyond academic achievement.

blog_2-6-20_evidenceessa_500x333

The social-emotional learning section represents the greatest departure from our prior work. This is due to the nature of SEL, which combines many quite diverse measures. We identified 17 distinct measures, which we grouped in four overarching categories, as follows:

Academic Competence

  • Academic performance
  • Academic engagement

Problem Behaviors

  • Aggression/misconduct
  • Bullying
  • Disruptive behavior
  • Drug/alcohol abuse
  • Sexual/racial harassment or aggression
  • Early/risky sexual behavior

Social Relationships

  • Empathy
  • Interpersonal relationships
  • Pro-social behavior
  • Social skills
  • School climate

Emotional Well-Being

  • Reduction of anxiety/depression
  • Coping skills/stress management
  • Emotional regulation
  • Self-esteem/self-efficacy

Evidence for ESSA reports overall effect sizes and ratings for each of the four categories, as well as the 17 individual measures (which are themselves composed of many measures used by various qualifying studies). So in contrast to reading and math, where programs are rated based on the average of all qualifying  reading or math measures, an SEL program could be rated “strong” in one category, “promising” in another, and “no qualifying evidence” or “qualifying studies found no significant positive effects” on others.

Social-Emotional Learning

The SEL review, led by Sooyeon Byun, Amanda Inns, Cynthia Lake, and Liz Kim at Johns Hopkins University, located 24 SEL programs that both met our inclusion standards and had at least one study that met strong, moderate, or promising standards on at least one of the four categories of outcomes.

There is much more evidence at the elementary and middle school levels than at the high school level. Recognizing that some programs had qualifying outcomes at multiple levels, there were 7 programs with positive evidence for pre-K/K, 10 for 1-2, 13 for 3-6, and 9 for middle school. In contrast, there were only 4 programs with positive effects in senior high schools. Fourteen studies took place in urban locations, 5 in suburbs, and 5 in rural districts.

The outcome variables most often showing positive impacts include social skills (12), school climate (10), academic performance (10), pro-social behavior (8), aggression/misconduct (7), disruptive behavior (7), academic engagement (7), interpersonal relationships (7), anxiety/depression (6), bullying (6), and empathy (5). Fifteen of the programs targeted whole classes or schools, and 9 targeted individual students.

Several programs stood out in terms of the size of the impacts. Take the Lead found effect sizes of +0.88 for social relationships and +0.51 for problem behaviors. Check, Connect, and Expect found effect sizes of +0.51 for emotional well-being, +0.29 for problem behaviors, and +0.28 for academic competence. I Can Problem Solve found effect sizes of +0.57 on school climate. The Incredible Years Classroom and Parent Training Approach reported effect sizes of +.57 for emotional regulation, +0.35 for pro-social behavior, and +0.21 for aggression/misconduct. The related Dinosaur School classroom management model reported effect sizes of +0.31 for aggression/misbehavior. Class-Wide Function-Related Intervention Teams (CW-FIT), an intervention for elementary students with emotional and behavioral disorders, had effect sizes of +0.47 and +0.30 across two studies for academic engagement and +0.38 and +0.21 for disruptive behavior. It also reported effect sizes of +0.37 for interpersonal relationships, +0.28 for social skills, and +0.26 for empathy. Student Success Skills reported effect sizes of +0.30 for problem behaviors, +0.23 for academic competence, and +0.16 for social relationships.

In addition to the 24 highlighted programs, Evidence for ESSA lists 145 programs that were no longer available, had no qualifying studies (e.g., no control group), or had one or more qualifying studies but none that met the ESSA Strong, Moderate, or Promising criteria. These programs can be found by clicking on the “search” bar.

There are many problems inherent to interpreting research on social-emotional skills. One is that some programs may appear more effective than others because they use measures such as self-report, or behavior ratings by the teachers who taught the program. In contrast, studies that used more objective measures, such as independent observations or routinely collected data, may obtain smaller impacts. Also, SEL studies typically measure many outcomes and only a few may have positive impacts.

In the coming months, we will be doing analyses and looking for patterns in the data, and will have more to say about overall generalizations. For now, the new SEL section provides a guide to what we know now about individual programs, but there is much more to learn about this important topic.

Attendance

Our attendance review was led by Chenchen Shi, Cynthia Lake, and Amanda Inns. It located ten attendance programs that met our standards. Only three of these reported on chronic absenteeism, which refers to students missing more than 10% of days. Many more focused on average daily attendance (ADA). Among programs focused on average daily attendance, a Milwaukee elementary school program called SPARK had the largest impact (ES=+0.25). This is not an attendance program per se, but it uses AmeriCorps members to provide tutoring services across the school, as well as involving families. SPARK has been shown to have strong effects on reading, as well as its impressive effects on attendance. Positive Action is another schoolwide approach, in this case focused on SEL. It has been found in two major studies in grades K-8 to improve student reading and math achievement, as well as overall attendance, with a mean effect size of +0.20.

The one program to report data on both ADA and chronic absenteeism is called Attendance and Truancy Intervention and Universal Procedures, or ATI-UP. It reported an effect size in grades K-6 of +0.19 for ADA and +0.08 for chronic attendance. Talent Development High School (TDHS) is a ninth grade intervention program that provides interdisciplinary learning communities and “double dose” English and math classes for students who need them. TDHS reported an effect size of +0.17.

An interesting approach with a modest effect size but very modest cost is now called EveryDay Labs (formerly InClass Today). This program helps schools organize and implement a system to send postcards to parents reminding them of the importance of student attendance. If students start missing school, the postcards include this information as well. The effect size across two studies was a respectable +0.16.

As with SEL, we will be doing further work to draw broader lessons from research on attendance in the coming months. One pattern that seems clear already is that effective attendance improvement models work on building close relationships between at-risk students and concerned adults. None of the effective programs primarily uses punishment to improve attendance, but instead they focus on providing information to parents and students and on making it clear to students that they are welcome in school and missed when they are gone.

Both SEL and attendance are topics of much discussion right now, and we hope these new sections will be useful and timely in helping schools make informed choices about how to improve social-emotional and attendance outcomes for all students.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

A Powerful Hunger for Evidence-Proven Technology

I recently saw a 1954 video of B. F. Skinner showing off a classroom full of eager students using teaching machines. In it, Skinner gave all the usual reasons that teaching machines were soon going to be far superior to ordinary teaching: They were scientifically made to enable students to experience constant success in small steps. They were adapted to students’ needs, so fast students did not need to wait for their slower classmates, and the slower classmates could have the time to solidify their understanding, rather than being whisked from one half-learned topic to the next, never getting a chance to master anything and therefore sinking into greater and greater failure.

Here it is 65 years later and “teaching machines,” now called computer-assisted instruction, are ubiquitous. But are they effective? Computers are certainly effective at teaching students to use technology, but can they teach the core curriculum of elementary or secondary schools? In a series of reviews in the Best Evidence Encyclopedia (BEE; www.bestevidence.org), my colleagues and I have reviewed research on the impacts of technology-infused methods on reading, mathematics, and science, in elementary and secondary schools. Here is a quick summary of my findings:

Mean Effect Sizes for Technology-Based Programs in Recent Reviews
Review Topic No. of Studies Mean Effect Size
Inns et al., in preparation Elementary Reading 23 +0.09
Inns et al., 2019 Struggling Readers 6 +0.06
Baye et al., 2018 Secondary Reading 23 -0.01
Pellegrini et al., 2019 Elementary Mathematics 14 +0.06

If you prefer “months of learning,” these are all about one month, except for secondary reading, which is zero. A study-weighted average across these reviews is an effect size of +0.05. That’s not nothing, but it’s not much. Nothing at all like what Skinner and countless other theorists and advocates have been promising for the past 65 years. I think that even the most enthusiastic fans of technology use in education are beginning to recognize that while technology may be useful in improving achievement on traditional learning outcomes, it has not yet had a revolutionary impact on learning of reading or mathematics.

How can we boost the impact of technology in education?

Whatever you think the effects of technology-based education might be for typical school outcomes, no one could deny that it would be a good thing if that impact were larger than it is today. How could government, the educational technology industry, researchers in and out of ed tech, and practicing educators work together to make technology applications more effective than they are now?

In order to understand how to proceed, it is important to acknowledge a serious problem in the world of ed tech today. Educational technology is usually developed by commercial companies. Like all commercial companies, they must serve their market. Unfortunately, the market for ed tech products is not terribly interested in the evidence supporting technology-based programs. Instead, they tend to pay attention to sales reps or marketing, or they seek opinions from their friends and colleagues, rather than looking at evidence. Technology decision makers often value attractiveness, ease of use, low cost, and current trends or fads, over evidence (see Morrison, Ross & Cheung, 2019, for documentation of these choice strategies).

Technology providers are not uncaring people, and they want their products to truly improve outcomes for children. However, they know that if they put a lot of money into developing and researching an innovative approach to education that happens to use technology, and their method requires a lot of professional development to produce substantially positive effects, their programs might be considered too expensive, and less expensive products that ask less of teachers and other educators would dominate the sector. These problems resemble those faced by textbook publishers, who similarly may have great ideas to increase the effectiveness of their textbooks or to add components that require professional development. Textbook designers are prisoners of their markets just as technology developers are.

The solution, I would propose, requires interventions by government designed to nudge education markets toward use of evidence. Government (federal, state, and local) has a real interest in improving outcomes of education. So how could government facilitate the use of technology-based approaches that are known to enhance student achievement more than those that exist today?

blog_5-24-18_DistStudents_500x332

How government could promote use of proven technology approaches

Government could lead the revolution in educational technology that market-driven technology developers cannot do on their own. It could do this by emphasizing two main strategies: providing funding to assist technology developers of all kinds (e.g., for-profit, non-profit, or universities), providing encouragement and incentives to motivate schools, districts, and states to use programs proven effective in rigorous research, and funding development, evaluation, and dissemination of proven technology-based programs.

Encouraging and incentivizing use of proven technology-based programs

The most important thing government must do to expand the use of proven technology-based approaches (as well as non-technology approaches) is to build a powerful hunger for them among educators, parents, and the public at large. Yes, I realize that this sounds backward; shouldn’t government sponsor development, research, and dissemination of proven programs first? Yes it should, and I’ll address this topic in a moment. Of course we need proven programs. No one will clamor for an empty box. But today, many proven programs already exist, and the bigger problem is getting them (and many others to come) enthusiastically adopted by schools. In fact, we must eventually get to the point where educational leaders value not only individual programs supported by research, but value research itself. That is, when they start looking for technology-based programs, their first step would be to find out what programs are proven to work, rather than selecting programs in the usual way and only then trying to find evidence to support the choice they have already made.

Government at any level could support such a process, but the most likely leader in this would be the federal government. It could provide incentives to schools that select and implement proven programs, and build off of this multifaceted outreach efforts to build hype around proven approaches and the idea that approaches should be proven.

A good example of what I have in mind was the Comprehensive School Reform (CSR) grants of the late 1990s. Schools that adopted whole-school reform models that met certain requirements could receive grants of up to $50,000 per year for three years. By the end of CSR, about 1000 schools got grants in a competitive process, but CSR programs were used in an estimated 6000 schools nationwide. In other words, the hype generated by the CSR grants process led many schools that never got a grant to find other resources to adopt these whole school programs. I should note that only a few of the adopted programs had evidence of effectiveness; in CSR, the core idea was whole-school reform, not evidence (though some had good evidence of effectiveness). But a process like CSR, with highly visible grants and active support from government, illustrates a process that built a powerful hunger for whole-school reform, which could work just as well, I think, if applied to building a powerful hunger for proven technology-based programs and other proven approaches.

“Wait a minute,” I can hear you saying. “Didn’t the ESSA evidence standards already do this?”

This was indeed the intention of ESSA, which established “strong,” “moderate,” and “promising” levels of evidence (as well as lower categories). ESSA has been a great first step in building interest in evidence. However, the only schools that could obtain additional funding for selecting proven programs were among the lowest-achieving schools in the country, so ordinary Title I schools, not to mention non-Title I schools, were not much affected. CSR gave extra points to high-poverty schools, but a much wider variety of schools could get into that game. There is a big different between creating interest in evidence, which ESSA has definitely done, and creating a powerful hunger for proven programs. ESSA was passed four years ago, and it is only now beginning to build knowledge and enthusiasm among schools.

Building many more proven technology-based programs

Clearly, we need many more proven technology-based programs. In our Evidence for ESSA website (www.evidenceforessa.org), we list 113 reading and mathematics programs that meet any of the three top ESSA standards. Only 28 of these (18 reading, 10 math) have a major technology component. This is a good start, but we need a lot more proven technology-based programs. To get them, government needs to continue its productive Institute for Education Sciences (IES) and Education Innovation Research (EIR) initiatives. For for-profit companies, Small Business Innovation Research (SBIR) plays an important role in early development of technology solutions. However, the pace of development and research focused on practical programs for schools needs to accelerate, and to learn from its own successes and failures to increase the success rate of its investments.

Communicating “what works”

There remains an important need to provide school leaders with easy-to-interpret information on the evidence base for all existing programs schools might select. The What Works Clearinghouse and our Evidence for ESSA website do this most comprehensively, but these and other resources need help to keep up with the rapid expansion of evidence that has appeared in the past 10 years.

Technology-based education can still produce the outcomes Skinner promised in his 1954 video, the ones we have all been eagerly awaiting ever since. However, technology developers and researchers need more help from government to build an eager market not just for technology, but for proven achievement outcomes produced by technology.

References

Baye, A., Lake, C., Inns, A., & Slavin, R. (2019). Effective reading programs for secondary students. Reading Research Quarterly, 54 (2), 133-166.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2019). A synthesis of quantitative research on programs for struggling readers in elementary schools. Available at www.bestevidence.org. Manuscript submitted for publication.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (in preparation). A synthesis of quantitative research on elementary reading. Baltimore, MD: Center for Research and Reform in Education, Johns Hopkins University.

Morrison, J. R., Ross, S.M., & Cheung, A.C.K. (2019). From the market to the classroom: How ed-tech products are procured by school districts interacting with vendors. Educational Technology Research and Development, 67 (2), 389-421.

Pellegrini, M., Inns, A., Lake, C., & Slavin, R. (2019). Effective programs in elementary mathematics: A best-evidence synthesis. Available at www.bestevidence.com. Manuscript submitted for publication.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.