What Works in Elementary Math?

Euclid, the ancient Greek mathematician, is considered the inventor of geometry. His king heard about it, and wanted to learn geometry, but being a king, he was kind of busy. He called in Euclid, and asked him if there was a faster way. “I’m sorry sire,” said Euclid, “but there is no royal road to geometry.”

Skipping forward a couple thousand years, Marta Pellegrini, of the University of Florence in Italy, spent nine months with our group at Johns Hopkins University and led a review of research on effective programs for elementary mathematics  (Pellegrini, Lake, Inns & Slavin, 2018), which was recently released on our Best Evidence Encyclopedia (BEE). What we found was not so different from Euclid’s conclusion, but broader: There’s no royal road to anything in mathematics. Improving mathematics achievement isn’t easy. But it is not impossible.

Our review focused on 78 very high-quality studies (65 used random assignment). 61 programs were divided into eight categories: tutoring, technology, professional development for math content and pedagogy, instructional process programs, whole-school reform, social-emotional approaches, textbooks, and benchmark assessments.

Tutoring had the largest and most reliably positive impacts on math learning. Tutoring included one-to-one and one-to-small group services, and some tutors were certified teachers and some were paraprofessionals (teacher assistants). The successful tutoring models were all well-structured, and tutors received high-quality materials and professional development. Across 13 studies involving face-to-face tutoring, average outcomes were very positive. Surprisingly, tutors who were certified teachers (ES=+0.34) and paraprofessionals (ES=+0.32) obtained very similar student outcomes. Even more surprising, one-to-small group tutoring (ES=+0.32) was as effective as one-to-one (ES=+0.26).

Beyond tutoring, the category with the largest average impacts was instructional programs, classroom organization and management approaches, such as cooperative learning and the Good Behavior Game. The mean effect size was +0.25.

blog_10-11-18_LTF_500x479

After these two categories, there were only isolated studies with positive outcomes. 14 studies of technology approaches had an average effect size of only +0.07. 12 studies of professional development to improve teachers’ knowledge of math content and pedagogy found an average of only +0.04. One study of a social-emotional program called Positive Action found positive effects but seven other SEL studies did not, and the mean for this category was +0.03. One study of a whole-school reform model called the Center for Data-Driven Reform in Education (CDDRE), which helps schools do needs assessments, and then find, select, and implement proven programs, showed positive outcomes (ES=+0.24), but three other whole-school models found no positive effects. Among 16 studies of math curricula and software, only two, Math in Focus (ES=+0.25) and Math Expressions (ES=+0.11), found significant positive outcomes. On average, benchmark assessment approaches made no difference (ES=0.00).

Taken together, the findings of the 78 studies support a surprising conclusion. Few of the successful approaches had much to do with improving math pedagogy. Most were one-to-one or one-to-small group tutoring approaches that closely resemble tutoring models long used with great success in reading. A classroom management approach, PAX Good Behavior Game, and a social-emotional model, Positive Action, had no particular focus on math, yet both had positive effects on math (and reading). A whole-school reform approach, the Center for Data-Driven Reform in Education (CDDRE), helped schools do needs assessments and select proven programs appropriate to their needs, but CDDRE focused equally on reading and math, and had significantly positive outcomes in both subjects. In contrast, math curricula and professional development specifically designed for mathematics had only two positive examples among 28 programs.

The substantial difference in outcomes of tutoring and outcomes of technology applications is also interesting. The well-established positive impacts of one-to-one and one-to-small group tutoring, in reading as well as math, are often ascribed to the tutor’s ability to personalize instruction for each student. Computer-assisted instruction is also personalized, and has been expected, largely on this basis, to improve student achievement, especially in math (see Cheung & Slavin, 2013). Yet in math, and also reading, one-to-one and one-to-small group tutoring, by certified teachers and paraprofessionals, is far more effective than the average for technology approaches. The comparison of outcomes of personalized CAI and (personalized) tutoring make it unlikely that personalization is a key explanation for the effectiveness of tutoring. Tutors must contribute something powerful beyond personalization.

I have argued previously that what tutors contribute, in addition to personalization, is a human connection, encouragement, and praise. A tutored child wants to please his or her tutor, not by completing a set of computerized exercises, but by seeing a tutor’s eyes light up and voice respond when the tutee makes progress.

If this is the secret of the effect of tutoring (beyond personalization), perhaps a similar explanation extends to other approaches that happen to improve mathematics performance without using especially innovative approaches to mathematics content or pedagogy. Approaches such as PAX Good Behavior Game and Positive Action, targeted on behavior and social-emotional skills, respectively, focus on children’s motivations, emotions, and behaviors. In the secondary grades, a program called Building Assets, Reducing Risk (BARR) (Corsello & Sharma, 2015) has an equal focus on social-emotional development, not math, but it also has significant positive effects on math (as well as reading). A study in Chile of a program called Conecta Ideas found substantial positive effects in fourth grade math by having students practice together in preparation for bimonthly math “tournaments” in competition with other schools. Both content and pedagogy were the same in experimental and control classes, but the excitement engendered by the tournaments led to substantial impacts (ES=+0.30 on national tests).

We need breakthroughs in mathematics teaching. Perhaps we have been looking in the wrong places, expecting that improved content and pedagogy will be the key to better learning. They will surely be involved, but perhaps it will turn out that math does not live only in students’ heads, but must also live in their hearts.

There may be no royal road to mathematics, but perhaps there is an emotional road. Wouldn’t it be astonishing if math, the most cerebral of subjects, turns out more than anything else to depend as much on heart as brain?

References

Cheung, A., & Slavin, R. E. (2013). The effectiveness of educational technology applications for enhancing mathematics achievement in K-12 classrooms: A meta-analysis. Educational Research Review, 9, 88-113.

Corsello, M., & Sharma, A. (2015). The Building Assets-Reducing Risks Program: Replication and expansion of an effective strategy to turn around low-achieving schools: i3 development grant final report. Biddeford, ME, Consello Consulting.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2018, March 3). Effective programs for struggling readers: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

Pellegrini, M., Inns, A., & Slavin, R. (2018, March 3). Effective programs in elementary mathematics: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

Photo credit: By Los Angeles Times Photographic Archive, no photographer stated. [CC BY 4.0  (https://creativecommons.org/licenses/by/4.0)], via Wikimedia Commons

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Advertisements

New Findings on Tutoring: Four Shockers

blog_04 05 18_SURPRISE_500x353One-to-one and one-to-small group tutoring have long existed as remedial approaches for students who are performing far below expectations. Everyone knows that tutoring works, and nothing in this blog contradicts this. Although different approaches have their champions, the general consensus is that tutoring is very effective, and the problem with widespread use is primarily cost (and for tutoring by teachers, availability of sufficient teachers). If resources were unlimited, one-to-one tutoring would be the first thing most educators would recommend, and they would not be wrong. But resources are never unlimited, and the numbers of students performing far below grade level are overwhelming, so cost-effectiveness is a serious concern. Further, tutoring seems so obviously effective that we may not really understand what makes it work.

In recent reviews, my colleagues and I examined what is known about tutoring. Beyond the simple conclusion that “tutoring works,” we found some big surprises, four “shockers.” Prepare to be amazed! Further, I propose an explanation to account for these unexpected findings.

We have recently released three reviews that include thorough, up-to-date reviews of research on tutoring. One is a review of research on programs for struggling readers in elementary schools by Amanda Inns and colleagues (2018). Another is a review on programs for secondary readers by Ariane Baye and her colleagues (2017). Finally, there is a review on elementary math programs by Marta Pellegrini et al. (2018). All three use essentially identical methods, from the Best Evidence Encyclopedia (www.bestevidence.org). In addition to sections on tutoring strategies, all three also include other, non-tutoring methods directed at the same populations and outcomes.

What we found challenges much of what everyone thought they knew about tutoring.

Shocker #1: In all three reviews, tutoring by paraprofessionals (teaching assistants) was at least as effective as tutoring by teachers. This was found for reading and math, and for one-to-one and one-to-small group tutoring.  For struggling elementary readers, para tutors actually had higher effect sizes than teacher tutors. Effect sizes were +0.53 for paras and +0.36 for teachers in one-to-one tutoring. For one-to-small group, effect sizes were +0.27 for paras, +0.09 for teachers.

Shocker #2: Volunteer tutoring was far less effective than tutoring by either paras or teachers. Some programs using volunteer tutors provided them with structured materials and extensive training and supervision. These found positive impacts, but far less than those for paraprofessional tutors. Volunteers tutoring one-to-one had an effect size of +0.18, paras had an effect size of +0.53. Because of the need for recruiting, training, supervision, and management, and also because the more effective tutoring models provide stipends or other pay, volunteers were not much less expensive than paraprofessionals as tutors.

Shocker #3:  Inexpensive substitutes for tutoring have not worked. Everyone knows that one-to-one tutoring works, so there has long been a quest for approaches that simulate what makes tutoring work. Yet so far, no one, as far as I know, has found a way to turn lead into tutoring gold. Although tutoring in math was about as effective as tutoring in reading, a program that used online math tutors communicating over the Internet from India and Sri Lanka to tutor students in England, for example, had no effect. Technology has long been touted as a means of simulating tutoring, yet even when computer-assisted instruction programs have been effective, their effect sizes have been far below those of the least expensive tutoring models, one-to-small group tutoring by paraprofessionals. In fact, in the Inns et al. (2018) review, no digital reading program was found to be effective with struggling readers in elementary schools.

 Shocker #4: Certain whole-class and whole-school approaches work as well or better for struggling readers than tutoring, on average. In the Inns et al. (2018) review, the average effect size for one-to-one tutoring approaches was +0.31, and for one-to-small group approaches it was +0.14. Yet the mean for whole-class approaches, such as Ladders to Literacy (ES = +0.48), PALS (ES = +0.65), and Cooperative Integrated Reading and Composition (ES = +0.19) averaged +0.33, similar to one-to-one tutoring by teachers (ES = +0.36). The mean effect sizes for comprehensive tiered school approaches, such as Success for All (ES = +0.41) and Enhanced Core Reading Instruction (ES = +0.22) was +0.43, higher than any category of tutoring (note that these models include tutoring as part of an integrated response to implementation approach). Whole-class and whole-school approaches work with many more students than do tutoring models, so these impacts are obtained at a much lower cost per pupil.

Why does tutoring work?

Most researchers and others would say that well-structured tutoring models work primarily because they allow tutors to fully individualize instruction to the needs of students. Yet if this were the only explanation, then other individualized approaches, such as computer-assisted instruction, would have outcomes similar to those of tutoring. Why is this not the case? And why do paraprofessionals produce at least equal outcomes to those produced by teachers as tutors? None of this squares with the idea that the impact of tutoring is entirely due to the tutor’s ability to recognize and respond to students’ unique needs. If that were so, other forms of individualization would be a lot more effective, and teachers would presumably be a lot more effective at diagnosing and responding to students’ problems than would less highly trained paraprofessionals. Further, whole-class and whole-school reading approaches, which are not completely individualized, would have much lower effect sizes than tutoring.

My theory to account for the positive effects of tutoring in light of the four “shockers” is this:

  • Tutoring does not work due to individualization alone. It works due to individualization plus nurturing and attention.

This theory begins with the fundamental and obvious assumption that children, perhaps especially low achievers, are highly motivated by nurturing and attention, perhaps far more than by academic success. They are eager to please adults who relate to them personally.  The tutoring setting, whether one-to-one or one-to-very small group, gives students the undivided attention of a valued adult who can give them personal nurturing and attention to a degree that a teacher with 20-30 students cannot. Struggling readers may be particularly eager to please a valued adult, because they crave recognition for success in a skill that has previously eluded them.

Nurturing and attention may explain the otherwise puzzling equality of outcomes obtained by teachers and paraprofessionals as tutors. Both types of tutors, using structured materials, may be equally able to individualize instruction, and there is no reason to believe that paras will be any less nurturing or attentive. The assumption that teachers would be more effective as tutors depends on the belief that tutoring is complicated and requires the extensive education a teacher receives. This may be true for very unusual learners, but for most struggling students, a paraprofessional may be as capable as a teacher in providing individualization, nurturing, and attention. This is not to suggest that paraprofessionals are as capable as teachers in every way. Teachers have to be good at many things: preparing and delivering lessons, managing and motivating classes, and much more. However, in their roles as tutors, teachers and paraprofessionals may be more similar.

Volunteers certainly can be nurturing and attentive, and can be readily trained in structured programs to individualize instruction. The problem, however, is that studies of volunteer programs report difficulties in getting volunteers to attend every day and to avoid dropping out when they get a paying job. This is may be less of a problem when volunteers receive a stipend; paid volunteers are much more effective than unpaid ones.

The failure of tutoring substitutes, such as individualized technology, is easy to predict if the importance of nurturing and attention is taken into account. Technology may be fun, and may be individualized, but it usually separates students from the personal attention of caring adults.

Whole-Class and Whole-School Approaches.

Perhaps the biggest shocker of all is the finding that for struggling readers, certain non-technology approaches to instruction for whole classes and schools can be as effective as tutoring. Whole-class and whole-school approaches can serve many more students at much lower cost, of course. These classroom approaches mostly use cooperative learning and phonics-focused teaching, or both, and the whole-school models especially Success for All,  combine these approaches with tutoring for students who need it.

The success of certain whole-class programs, of certain tutoring approaches, and of whole-school approaches that combine proven teaching strategies with tutoring for students who need more, argues for response to intervention (RTI), the policy that has been promoted by the federal government since the 1990s. So what’s new? What’s new is that the approach I’m advocating is not just RTI. It’s RTI done right, where each component of  the strategy has strong evidence of effectiveness.

The good news is that we have powerful and cost-effective tools at our disposal that we could be putting to use on a much more systematic scale. Yet we rarely do this, and as a result far too many students continue to struggle with reading, even ending up in special education due to problems schools could have prevented. That is the real shocker. It’s up to our whole profession to use what works, until reading failure becomes a distant memory. There are many problems in education that we don’t know how to solve, but reading failure in elementary school isn’t one of them.

Practical Implications.

Perhaps the most important practical implication of this discussion is a realization that benefits similar or greater than those of one-to-one tutoring by teachers can be obtained in other ways that can be cost-effectively extended to many more students: Using paraprofessional tutors, using one-to-small group tutoring, or using whole-class and whole-school tiered strategies. It is no longer possible to say with a shrug, “of course tutoring works, but we can’t afford it.” The “four shockers” tell us we can do better, without breaking the bank.

 

References

Baye, A., Lake, C., Inns, A., & Slavin, R. (2017). Effective reading programs for secondary students. Manuscript submitted for publication. Also see Baye, A., Lake, C., Inns, A. & Slavin, R. E. (2017, August). Effective Reading Programs for Secondary Students. Baltimore, MD: Johns Hopkins University, Center for Research and Reform in Education.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2018). Effective programs for struggling readers: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

Pellegrini, M., Inns, A., & Slavin, R. (2018). Effective programs in elementary mathematics: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Photo by Westsara (Own work) [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons

 

Love, Hope, and Evidence in Secondary Reading

I am pleased to announce that our article reviewing research on effective secondary reading programs has just been posted on the Best Evidence Encyclopedia, aka the BEE. Written with my colleagues Ariane Baye, Cynthia Lake, and Amanda Inns, our review found 64 studies of 49 reading programs for students in grades 6 to 12, which had to meet very high standards of quality. For example, 55 of the studies used random assignment to conditions.

But before I get all nerdy about the technical standards of the review, I want to reflect on what we learned. I’ve already written about one thing we learned, that simply providing more instructional time made little difference in outcomes. In 22 of the studies, students got an extra period for reading beyond what control students got for at least an entire year, yet programs (other than tutoring) that provided extra time did no better than those that did not.

If time doesn’t help struggling readers, what does? I think I can summarize our findings with three words: love, hope, and evidence.

Love and hope are exactly what students who are reading below grade level are lacking. They are no longer naive. They know exactly what it means to be a poor reader in a high-poverty secondary school (almost all of the schools in our review served disadvantaged adolescents). If you can’t read well, college is out of the question. Decent jobs without a degree are scarce. If you have no hope, you cannot be motivated, or you may be motivated in antisocial directions that give you at least a chance for money and recognition. Every child needs love, but poor readers in secondary schools are too often looking for love in all the wrong places.

The successful programs in our review were ones that give adolescents a chance to earn the hope and love they crave. One category, all studies done in England, involved one-to-one and small group tutoring. How better to build close relationships between students and caring adults than to have individual or very small group time with them? And the one-to-one or small group setting allows tutors to personalize instruction, giving students a sense of hope that this time, their efforts will pay off (as the evidence says it will).

But the largest impacts in our review came from two related programs – The Reading Edge and Talent Development High School (TDHS). These both developed in our research center at Johns Hopkins University in the 1990s, so I have to be very modest here. But beyond these individual programs, I think there is a larger message.

Both The Reading Edge (for middle schools) and TDHS (for high schools) organize students into mixed-ability cooperative teams. The team members work on activities designed to build reading comprehension and related skills. Students are frequently assessed and on the basis of those assessments, they can earn recognition for their teams. Teachers introduce lessons, and then, as students work with each other on reading activities, teachers can cruise around the class looking in on students who need encouragement or help, solving problems, and building relationships. Students are on task, eager to learn, and seeing the progress they are making, but students and teachers are laughing together, sharing easy banter, and encouraging each other. Yes, this really happens. I’ve seen it hundreds of times in secondary schools throughout the U.S. and England.

Many of the most successful programs in our review also are based on principles of love and hope. BARR, a high school program, is an excellent example. It uses block scheduling to build positive relationships among a group of students and teachers, adding regular meetings between teachers and students to review their progress in all areas, social as well as academic. The program focuses on building positive social-emotional skills and behaviors, and helping students describe their desired futures, make plans to get there, and regularly review progress on their plans with their teachers and peers. Love and hope.

California’s Expository Reading and Writing Course helps 12th graders hoping to attend California State Universities prepare to pass the test used to determine whether students have to take remedial English (a key factor in college dropout). The students work in groups, helping each other to build reading, writing, and discussion skills, and helping students to visualize a future for themselves. Love and hope.

A few technology programs showed promising outcomes, especially Achieve3000 and Read 180. These do not replace teachers and peers with technology, but instead cycle students through small group, teacher-led, and computer-assisted activities. Pure technology programs did not work so well, but models taking advantage of relationships as well as personalization did best. Love and hope.

Of course, love and hope are not sufficient. We also need evidence that students are learning more than they might have been. To produce positive achievement effects requires outstanding teaching strategies, professional development, curricular approaches, assessments, and more. Love and hope may be necessary but they are not sufficient.

Our review applied the toughest evidence standards we have ever applied. Most of the studies we reviewed did not show positive impacts on reading achievement. But the ones that did so inspire that much more confidence. The very fact that we could apply these standards and still find plenty of studies that meet them shows how much our field is maturing. This in itself fills me with hope.

And love.

Apology

In a recent blog, I wrote about work we are doing to measure the impact on reading and math performance of a citywide campaign to provide assessments and eyeglasses to every child in Baltimore, from pre-k to grade 8. I forgot to mention the name of the project, Vision for Baltimore, and neglected to say that the project operates under the authority of the Baltimore City Health Department, which has been a strong supporter. I apologize for the omission.

Time Passes. Will You?

When I was in high school, one of my teachers posted a sign on her classroom wall under the clock:

Time passes. Will you?

Students spend a lot of time watching clocks, yearning for the period to be over. Yet educators and researchers often seem to believe that more time is of course beneficial to kids’ learning. Isn’t that obvious?

In a major review of secondary reading programs I am completing with my colleagues Ariane Baye, Cynthia Lake, and Amanda Inns, it turns out that the kids were right. More time, at least in remedial reading, may not be beneficial at all.

Our review identified 60 studies of extraordinary quality- mostly large-scale randomized experiments- evaluating reading programs for students in grades 6 to 12. In most of the studies, students reading 2 to 5 grade levels below expectations were randomly assigned to receive an extra class period of reading instruction every day all year, in some cases for two or three years. Students randomly assigned to the control group continued in classes such as art, music, or study hall. The strategies used in the remedial classes varied widely, including technology approaches, teaching focused on metacognitive skills (e.g., summarization, clarification, graphic organizers), teaching focused on phonics skills that should have been learned in elementary school, and other remedial approaches, all of which provided substantial additional time for reading instruction. It is also important to note that the extra-time classes were generally smaller than ordinary classes, in the range of 12 to 20 students.

In contrast, other studies provided whole class or whole school methods, many of which also focused on metacognitive skills, but none of which provided additional time.

Analyzing across all studies, setting aside five British tutoring studies, there was no effect of additional time in remedial reading. The effect size for the 22 extra-time studies was +0.08, while for 34 whole class/whole school studies, it was slightly higher, ES =+0.10. That’s an awful lot of additional teaching time for no additional learning benefit.

So what did work? Not surprisingly, one-to-one and small-group tutoring (up to one to four) were very effective. These are remedial and do usually provide additional teaching time, but in a much more intensive and personalized way.

Other approaches that showed particular promise simply made better use of existing class time. A program called The Reading Edge involves students in small mixed-ability teams where they are responsible for the reading success of all team members. A technology approach called Achieve3000 showed substantial gains for low-achieving students. A whole-school model called BARR focuses on social-emotional learning, building relationships between teachers and students, and carefully monitoring students’ progress in reading and math. Another model called ERWC prepares 12th graders to succeed on the tests used to determine whether students have to take remedial English at California State Universities.

What characterized these successful approaches? None were presented as remedial. All were exciting and personalized, and not at all like traditional instruction. All gave students social supports from peers and teachers, and reasons to hope that this time, they were going to be successful.

There is no magic to these approaches, and not every study of them found positive outcomes. But there was clearly no advantage of remedial approaches providing extra time.

In fact, according to the data, students would have done just as well to stay in art or music. And if you’d asked the kids, they’d probably agree.

Time is important, but motivation, caring, and personalization are what counts most in secondary reading, and surely in other subjects as well.

Time passes. Kids will pass, too, if we make such good use of our time with them that they won’t even notice the minutes going by.

Joy is a Basic Skill in Secondary Reading

I have a policy of not talking about studies I’m engaged in before they are done and available, but I have an observation to make that just won’t wait.

I’m working on a review of research on secondary reading programs with colleagues Ariane Baye (University of Liege in Belgium) and Cynthia Lake (Johns Hopkins University). We have found a large number of very high-quality studies evaluating a broad range of programs. Most are large, randomized experiments.

Mostly, our review is really depressing. The great majority of studies have found no effects on learning. In particular, programs that focus on teaching middle and high school students struggling in reading in classes of 12 to 20, emphasizing meta-cognitive strategies, phonics, fluency, and/or training for teachers in what they were already doing, show few impacts on learning. Most of the studies provided daily, extra reading classes to help struggling readers build their skills, while the control group got band or art. They should have stayed in band or art.

Yet all is not dismal. Two approaches did have markedly positive effects. One was tutoring students in groups of one to four, not every day but perhaps twice a week. The other was cooperative learning, where students worked in four-member teams to help each other learn and practice reading skills. How could these approaches be so much more effective than the others?

My answer begins with a consideration of the nature of struggling adolescent readers. They are bored out of their brains. They are likely to see school as demeaning, isolating, and unrewarding. All adolescents live for their friends. They crave mastery and respect. Remedial approaches have to be fantastic to overcome the negative aspects of having to be remediated in the first place.

Tutoring can make a big difference, because groups are small enough for students to make meaningful relationships with adults and with other kids, and instruction can be personalized to meet their unique needs, to give them a real shot at mastery.

Cooperative learning, however, had a larger average effect size than tutoring. Even though cooperative learning did not require smaller class sizes and extra daily instructional periods, it was much more effective than remedial instruction. Cooperative learning gives struggling adolescent readers opportunities to work with their peers, to teach each other, to tease each other, to laugh, to be active rather than passive. To them, it means joy. And joy is a basic skill.

Of course, joy is not enough. Kids must be learning joyfully, not just joyful. Yet in our national education system, so focused on testing and accountability, we have to keep remembering who we are teaching and what they need. More of the same, a little slower and a little louder, won’t do it. Adolescents need a reason to believe that things can be better, and that school need not cut them off from their peers. They need opportunities to teach and learn from each other. School must be joyful, or it is nothing at all, for so many adolescents.

The Wonderful Reputation of Educational Research

Back in 1993, Carl Kaestle memorably wrote about the “awful reputation of educational research.” At the time, he was right. But that was 23 years ago. In the interim, educational research has made extraordinary advances. It is now admired by researchers in many other fields and by policy makers in many areas of government. As indicated by the importance of evidence in the Every Student Succeeds Act (ESSA), evidence is starting to make more of a difference in policy and practice. There is still a long, long way to go, but the trend is hugely positive.

In a recent article for the Brookings Institution, Ruth Curran Neild, acting director of the Institute of Education Sciences (IES), argued that educational research is on the right track. The one thing it lacks, she says, is adequate funding. I totally agree. Of course there are improvements that could be made to education policies and practices, but the part of the education field working on using science to improve outcomes for children is very much going in the right direction. Many are frustrated that it is not getting there fast enough, but we need more wind in our sails, not a change of course.

I was listening recently to an NPR broadcast about a new center for research on immunological treatments for cancer. The interviewer asked how their center could possibly make much difference with a grant of only $250 million. The director sheepishly agreed this was a problem, but hoped they could nevertheless make a contribution. If only we in education had conversations like this – ever!

What has radically changed over the past 15 years is that there is now far more support than there once was for randomized evaluations of replicable programs and practices, and as a result we are collectively building a strong set of studies that use the kinds of designs common in medicine and agriculture but not, until recently, in education. My colleagues and I constantly update reviews of research on educational interventions in the main areas of practice at the Best Evidence Encyclopedia website. Where once randomized studies were rare, they are becoming the norm. We recently published a review of research on early childhood programs, in which we located 32 studies of 22 different programs. Twenty-nine of the studies used randomized designs, thanks primarily to funding and leadership from a federal investment called Preschool Curriculum Evaluation Research (PCER). We are working on a review of research on secondary reading programs. Due to the federal Striving Readers program, which invested in evaluations of a wide variety of school interventions, our review is now dominated by randomized studies. Studies of programs for struggling elementary readers are now overwhelmingly randomized. The Investing in Innovation (i3) program requires randomized evaluations in its validation and scale-up grants and encourages them in its development grants, and this is increasing the prevalence of randomized studies across all studies of programs for students from grades pre-K to 12. The National Science Foundation has begun to fund scale-up projects that require random assignment, as have a few private foundations.

Random assignment is the hallmark of rigorous science. From a methodological standpoint, random assignment is crucial because only when students, teachers, or schools are randomly assigned to treatment or control conditions can readers be sure that any differences observed at posttest are truly the result of the treatments, and not of self-selection or other bias. But more than this, use of random assignment establishes a field as serious about its science. Studies that use random assignment are called “gold standard,” because there is no better design in existence. Yes, there are better and worse randomized studies, better and worse measures, and so on. Mixed methods studies can usefully add insight to the numbers. Replication is very important in establishing effectiveness. And there are certainly circumstances in which randomization is impossible or impractical, and a well-done quasi-experiment will do. But all this being said, the use of randomization moves the science of education forward and gives educational leaders reliable information on which to make decisions.

The most telling criticism of randomized experiments is that they are expensive. Yes, they can be. Encouragement and funding from IES and the Laura and John Arnold Foundation is increasing the use of inexpensive experiments in situations in which treatments and (usually) measures are already being paid for by government or other sources, so only funding for the evaluation is needed. But these experiments are only possible in special circumstances. In others, someone has to come up with serious funding to support randomized designs.

This brings us back to Ruth Neild’s main point. We know what needs to be done in educational research. We need to develop a wide variety of promising innovations, subject them to rigorous, ultimately randomized experiments, and then disseminate those programs found to be effective. We have systems in place to do all of these things. We just need a lot more funding to do them faster and better.

I don’t know if the increases in the quality of research in education are understood by policy makers, or how much this quality matters for funding. But education now has a case to make that it deserves much greater funding. Educational research is no longer just of interest to the academics who do it. It is producing answers that matter for children, and that should justify funding in line with our field’s new, wonderful reputation.

To Pluto and Beyond

Like many others, I was thrilled to see The New Horizons spacecraft reach and photograph Pluto. After being banished from the League of Planets shortly after New Horizons was launched, I’ll bet Pluto felt much better with all the attention.

Those who read this blog are probably expecting me to go into a rant at this point about how much we are willing to spend to send a spacecraft to take pictures and how little we are willing to spend on finding out how to help our nation’s children learn to read the newspaper or understand the math or the space science around this marvelous event. Well, consider it ranted. It does not make me feel any better that funding for NASA itself is being cut. We are a hugely wealthy country, and we can afford to go to Pluto and to educate our children to a much higher standard than we do. In fact, the way we became a hugely wealthy country, and the only way we can maintain our wealth into the future, is by investing in education, science, technology and invention.

My colleagues and I recently completed reviews of research on elementary and then secondary science education. You can find them here. The reviews find very similar outcomes at the different grade levels. Instructional methods emphasizing professional development for teachers on well-defined teaching strategies, such as cooperative learning and science-reading integration, have solid effects on science learning outcomes. Moving from one textbook to another almost never makes a difference, and use of science kits does not improve science learning. Technology-focused programs have a great deal of promise, but the studies are few and of limited quality, at least so far.

However, the most depressing finding is that there were far too few studies, across all science teaching approaches, that met even modest standards of rigor. Using our standards (which just require a control group, initial equality, fair measures, and a duration of 12 weeks), there were just 21 secondary studies in the past quarter-century. The number was the same for elementary studies. This is shameful. Science teaching is widely acknowledged to be a key to our nation’s future, yet our investment in high-quality studies and innovation is so low that we really know very little about how to do it better.

To explore the universe, to cure diseases, to engineer new solutions of all kinds, requires a population that is proficient in science, technology and mathematics. Is there anyone on the (still recognized) planet Earth who does not know this? Yet if we were serious about going boldly where no nation has gone before, would we continue to invest so little in understanding how to engage and excite our students in science, math, and technology?

Today, we rely on an extraordinary but tiny elite for the scientific progress we do make. We need to extend far beyond this, as more and more occupations come to require deep understanding of science and math. We need to enable teachers in elementary and secondary schools to democratize science knowledge and skill. There is no question that we can design better teaching methods and technologies, evaluate them, and scale them up. I wonder when we will get serious about doing so?

Congratulations to NASA, the Johns Hopkins Applied Physics Lab, and the American taxpayer for the New Horizons trip to Pluto. But consider this. The next generation of scientists and engineers who will perform the marvels of the future are in elementary and secondary classes right now. Improving science learning for these precious future scientists and engineers is essential for our nation’s future.