What Works in Teaching Writing?

“I’ve learned that people will forget what you said, people will forget what you did, but people will never forget how you made them feel. The idea is to write it so that people hear it and it slides through the brain and goes straight to the heart.”   -Maya Angelou

It’s not hard to make an argument that creative writing is the noblest of all school subjects. To test this, try replacing the word “write” in this beautiful quotation from Maya Angelou with “read” or “compute.” Students must be proficient in reading and mathematics and other subjects, of course, but in what other subject must learners study how to reach the emotions of their readers?

blog_3-21-19_mangelou2_394x500

Good writing is the mark of an educated person. Perhaps especially in the age of electronic communications, we know most of the people we know largely through their writing. Job applications depend on the ability of the applicant to make themselves interesting to someone they’ve never seen. Every subject–science, history, reading, and many more–requires its own exacting types of writing.

Given the obvious importance of writing in people’s lives, you’d naturally expect that writing would occupy a central place in instruction. But you’d be wrong. Before secondary school, writing plays third fiddle to the other two of the 3Rs, reading and ‘rithmetic, and in secondary school, writing is just one among many components of English. College professors, employers, and ordinary people complain incessantly about the poor writing skills of today’s youth. The fact is that writing is not attended to as much as it should be, and the results are apparent to all.

Not surprisingly, the inadequate focus on writing in U.S. schools extends to an inadequate focus on research on this topic as well. My colleagues and I recently carried out a review of research on secondary reading programs. We found 69 studies that met rigorous inclusion criteria (Baye, Lake, Inns, & Slavin, in press). Recently, our group completed a review of secondary writing using similar inclusion standards, under funding from the Education Endowment Foundation in England (Slavin, Lake, Inns, Baye, Dachet, & Haslam, 2019). Yet we found only 14 qualifying studies, of which 11 were in secondary schools (we searched down to third grade).

To be fair, our inclusion standards were pretty tough. We required that studies compare experimental groups to randomized or matched control groups on measures independent of the experimental treatment. Tests could not have been made up by teachers or researchers, and they could not be scored by the teachers who taught the classes. Experimental and control groups had to be well-matched at pretest and have nearly equal attrition (loss of subjects over time). Studies had to have a duration of at least 12 weeks. Studies could include students with IEPs, but they could not be in self-contained, special education settings.

We divided the studies into three categories. One was studies of writing process models, in which students worked together to plan, draft, revise, and edit compositions in many genres. A very similar category was cooperative learning models, most of which also used a plan-draft-revise-edit cycle, but placed a strong emphasis on use of cooperative learning teams. A third category was programs that balanced writing with reading instruction.

Remarkably, the average effect sizes of each of the three categories were virtually identical, with a mean effect size of +0.18. There was significant variation within categories, however. In the writing process category, the interesting story concerned a widely used U.S. program, Self-Regulated Strategy Development (SRSD), evaluated in two qualifying studies in England. In one, the program was implemented in rural West Yorkshire and had huge impacts on struggling writers, the students for whom SRSD was designed. The effect size was +0.74. However, in a much larger study in urban Leeds and Lancashire, outcomes were not so positive (ES= +0.01), although effects were largest for struggling writers. There were many studies of SRSD in the U.S, but none of them qualified, due to a lack of control group, brief experiments, measures made up by researchers, and located in all-special education classrooms.

Three programs that emphasize cooperative learning had notably positive impacts. These were Writing Wings (ES = +0.13), Student Team Writing (ES = +0.38), and Expert 21 (ES = +0.58).

Among programs emphasizing reading and writing, two had a strong focus on English learners: Pathway (ES = +0.32) and ALIAS (ES = +0.18). Another two approaches had an explicit focus on preparing students for freshman English: College Ready Writers Program (ES = +0.18) and Expository Reading and Writing Course (ES = =0.13).

Looking across all categories, there were several factors common to successful programs that stood out:

  • Cooperative Learning. Cooperative learning usually aids learning in all subjects, but it makes particular sense in writing, as a writing team gives students opportunities to give and receive feedback on their compositions, facilitating their efforts to gain insight into how their peers think about writing, and giving them a sympathetic and ready audience for their writing.
  • Writing Process. Teaching students step-by-step procedures to work with others to plan, draft, revise, and edit compositions in various genres appears to be very beneficial. The first steps focus on helping students get their ideas down on paper without worrying about mechanics, while the later stages help students progressively improve the structure, organization, grammar, and punctuation of their compositions. These steps help students reluctant to write at all to take risks at the outset, confident that they will have help from peers and teachers to progressively improve their writing.
  • Motivation and Joy in Self-Expression. In the above quote, Maya Angelou talks about the importance in writing of “sliding through the brain to get to the heart.” But to the writer, this process must work the other way, too. Good writing starts in the heart, with an urge to say something of importance. The brain shapes writing to make it readable, but writing must start with a message that the writer cares about. This principle is demonstrated most obviously in writing process and cooperative learning models, where every effort is made to motivate students to find exciting and interesting topics to share with their peers. In programs balancing reading and writing, reading is used to help students have something important to write.
  • Extensive Professional Development. Learning to teach writing well is not easy. Teachers need opportunities to learn new strategies and to apply them in their own writing. All of the successful writing programs we identified in our review provided extensive, motivating, and cooperative professional development, often designed as much to help teachers catch the spirit of writing as to follow a set of procedures.

Our review of writing research found that there is considerable consensus in how to teach writing. There were more commonalities than differences across the categories. Effects were generally positive, however, because control teachers were not using these consensus strategies, or were not doing so with the skills imparted by the professional development characteristic of all of the successful approaches.

We cannot expect writing instruction to routinely produce Maya Angelous or Mark Twains. Great writers add genius to technique. However, we can create legions of good writers, and our students will surely benefit.

References

Baye, A., Lake, C., Inns, A., & Slavin, R. (in press). Effective reading programs for secondary students. Reading Research Quarterly.

Slavin, R. E., Lake, C. Inns, A., Baye, A., Dachet, D., & Haslam, J. (2019). A quantitative synthesis of research on writing approaches in Key Stage 2 and secondary schools. London: Education Endowment Foundation.

Photo credit: Kyle Tsui from Washington, DC, USA [CC BY 2.0 (https://creativecommons.org/licenses/by/2.0)]

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Advertisements

A Mathematical Mystery

My colleagues and I wrote a review of research on elementary mathematics (Pellegrini, Lake, Inns, & Slavin, 2018). I’ve written about it before, but I wanted to hone in on one extraordinary set of findings.

In the review, there were 12 studies that evaluated programs that focused on providing professional development for elementary teachers of mathematics content and mathematics –-specific pedagogy. I was sure that this category would find positive effects on student achievement, but it did not. The most remarkable (and depressing) finding involved the huge year-long Intel study in which 80 teachers received 90 hours of very high-quality in-service during the summer, followed by an additional 13 hours of group discussions of videos of the participants’ class lessons. Teachers using this program were compared to 85 control teachers. After all this, students in the Intel classes scored slightly worse than controls on standardized measures (Garet et al., 2016).

If the Intel study were the only disappointment, one might look for flaws in their approach or their evaluation design or other things specific to that study. But as I noted earlier, all 12 of the studies of this kind failed to find positive effects, and the mean effect size was only +0.04 (n.s.).

Lest anyone jump to the conclusion that nothing works in elementary mathematics, I would point out that this is not the case. The most impactful category was tutoring programs, so that’s a special case. But the second most impactful category had many features in common with professional development focused on mathematics content and pedagogy, but had an average effect size of +0.25. This category consisted of programs focused on classroom management and motivation: Cooperative learning, classroom management strategies using group contingencies, and programs focusing on social emotional learning.

So there are successful strategies in elementary mathematics, and they all provided a lot of professional development. Yet programs for mathematics content and pedagogy, all of which also provided a lot of professional development, did not show positive effects in high-quality evaluations.

I have some ideas about what may be going on here, but I advance them cautiously, as I am not certain about them.

The theory of action behind professional development focused on mathematics content and pedagogy assumes that elementary teachers have gaps in their understanding of mathematics content and mathematics-specific pedagogy. But perhaps whatever gaps they have are not so important. Here is one example. Leading mathematics educators today take a very strong view that fractions should never be taught using pizza slices, but only using number lines. The idea is that pizza slices are limited to certain fractional concepts, while number lines are more inclusive of all uses of fractions. I can understand and, in concept, support this distinction. But how much difference does it make? Students who are learning fractions can probably be divided into three pizza slices. One slice represents students who understand fractions very well, however they are presented, and another slice consists of students who have no earthly idea about fractions. The third slice consists of students who could have learned fractions if it were taught with number lines but not pizzas. The relative sizes of these slices vary, but I’d guess the third slice is the smallest. Whatever it is, the number of students whose success depends on fractions vs. number lines is unlikely to be large enough to shift the whole group mean very much, and that is what is reported in evaluations of mathematics approaches. For example, if the “already got it” slice is one third of all students, and the “probably won’t get it” slice is also one third, the slice consisting of students who might get the concept one way but not the other is also one third. If the effect size for the middle slice were as high as an improbable +0.20, the average for all students would be less than +0.07, averaging across the whole pizza.

blog_2-14-19_slices_500x333

A related possibility relates to teachers’ knowledge. Assume that one slice of teachers already knows a lot of the content before the training. Another slice is not going to learn or use it. The third slice, those who did not know the content before but will use it effectively after training, is the only slice likely to show a benefit, but this benefit will be swamped by the zero effects for the teachers who already knew the content and those who will not learn or use it.

If teachers are standing at the front of the class explaining mathematical concepts, such as proportions, a certain proportion of students are learning the content very well and a certain proportion are bored, terrified, or just not getting it. It’s hard to imagine that the successful students are gaining much from a change of content or pedagogy, and only a small proportion of the unsuccessful students will all of a sudden understand what they did not understand before, just because it is explained better. But imagine that instead of only changing content, the teacher adopts cooperative learning. Now the students are having a lot of fun working with peers. Struggling students have an opportunity to ask for explanations and help in a less threatening environment, and they get a chance to see and ultimately absorb how their more capable teammates approach and solve difficult problems. The already high-achieving students may become even higher achieving, because as every teacher knows, explanation helps the explainer as much as the student receiving the explanation.

The point I am making is that the findings of our mathematics review may reinforce a general lesson we take away from all of our reviews: Subtle treatments produce subtle (i.e., small) impacts. Students quickly establish themselves as high or average or low achievers, after which time it is difficult to fundamentally change their motivations and approaches to learning. Making modest changes in content or pedagogy may not be enough to make much difference for most students. Instead, dramatically changing motivation, providing peer assistance, and making mathematics more fun and rewarding, seems more likely to make a significant change in learning than making subtle changes in content or pedagogy. That is certainly what we have found in systematic reviews of elementary mathematics and elementary and secondary reading.

Whatever the student outcomes are compared to controls, there may be good reason to improve mathematics content and pedagogy. But if we are trying to improve achievement for all students, the whole pizza, we need to use methods that make a more profound impact on all students. And that is true any way you slice it.

References

Garet, M. S., Heppen, J. B., Walters, K., Parkinson, J., Smith, T. M., Song, M., & Borman, G. D. (2016). Focusing on mathematical knowledge: The impact of content-intensive teacher professional development (NCEE 2016-4010). Washington, DC: U.S. Department of Education.

Pellegrini, M., Inns, A., Lake, C., & Slavin, R. E. (2018). Effective programs in elementary mathematics: A best-evidence synthesis. Paper presented at the Society for Research on Effective Education, Washington, DC.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

 

Succeeding Faster in Education

“If you want to increase your success rate, double your failure rate.” So said Thomas Watson, the founder of IBM. What he meant, of course, is that people and organizations thrive when they try many experiments, even though most experiments fail. Failing twice as often means trying twice as many experiments, leading to twice as many failures—but also, he was saying, many more successes.

blog_9-20-18_TJWatson_500x488
Thomas Watson

In education research and innovation circles, many people know this quote, and use it to console colleagues who have done an experiment that did not produce significant positive outcomes. A lot of consolation is necessary, because most high-quality experiments in education do not produce significant positive outcomes. In studies funded by the Institute for Education Sciences (IES), Investing in Innovation (i3), and England’s Education Endowment Foundation (EEF), all of which require very high standards of evidence, fewer than 20% of experiments show significant positive outcomes.

The high rate of failure in educational experiments is often shocking to non-researchers, especially the government agencies, foundations, publishers, and software developers who commission the studies. I was at a conference recently in which a Peruvian researcher presented the devastating results of an experiment in which high-poverty, mostly rural schools in Peru were randomly assigned to receive computers for all of their students, or to continue with usual instruction. The Peruvian Ministry of Education was so confident that the computers would be effective that they had built a huge model of the specific computers used in the experiment and attached it to the Ministry headquarters. When the results showed no positive outcomes (except for the ability to operate computers), the Ministry quietly removed the computer statue from the top of their building.

Improving Success Rates

Much as I believe Watson’s admonition (“fail more”), there is another principle that he was implying, or so I expect: We have to learn from failure, so we can increase the rate of success. It is not realistic to expect government to continue to invest substantial funding in high-quality educational experiments if the success rate remains below 20%. We have to get smarter, so we can succeed more often. Fortunately, qualitative measures, such as observations, interviews, and questionnaires, are becoming required elements of funded research, facilitating finding out what happened so that researchers can find out what went wrong. Was the experimental program faithfully implemented? Were there unexpected responses toward the program by teachers or students?

In the course of my work reviewing positive and disappointing outcomes of educational innovations, I’ve noticed some patterns that often predict that a given program is likely or unlikely to be effective in a well-designed evaluation. Some of these are as follows.

  1. Small changes lead to small (or zero) impacts. In every subject and grade level, researchers have evaluated new textbooks, in comparison to existing texts. These almost never show positive effects. The reason is that textbooks are just not that different from each other. Approaches that do show positive effects are usually markedly different from ordinary practices or texts.
  2. Successful programs almost always provide a lot of professional development. The programs that have significant positive effects on learning are ones that markedly improve pedagogy. Changing teachers’ daily instructional practices usually requires initial training followed by on-site coaching by well-trained and capable coaches. Lots of PD does not guarantee success, but minimal PD virtually guarantees failure. Sufficient professional development can be expensive, but education itself is expensive, and adding a modest amount to per-pupil cost for professional development and other requirements of effective implementation is often the best way to substantially enhance outcomes.
  3. Effective programs are usually well-specified, with clear procedures and materials. Rarely do programs work if they are unclear about what teachers are expected to do, and helped to do it. In the Peruvian study of one-to-one computers, for example, students were given tablet computers at a per-pupil cost of $438. Teachers were expected to figure out how best to use them. In fact, a qualitative study found that the computers were considered so valuable that many teachers locked them up except for specific times when they were to be used. They lacked specific instructional software or professional development to create the needed software. No wonder “it” didn’t work. Other than the physical computers, there was no “it.”
  4. Technology is not magic. Technology can create opportunities for improvement, but there is little understanding of how to use technology to greatest effect. My colleagues and I have done reviews of research on effects of modern technology on learning. We found near-zero effects of a variety of elementary and secondary reading software (Inns et al., 2018; Baye et al., in press), with a mean effect size of +0.05 in elementary reading and +0.00 in secondary. In math, effects were slightly more positive (ES=+0.09), but still quite small, on average (Pellegrini et al., 2018). Some technology approaches had more promise than others, but it is time that we learned from disappointing as well as promising applications. The widespread belief that technology is the future must eventually be right, but at present we have little reason to believe that technology is transformative, and we don’t know which form of technology is most likely to be transformative.
  5. Tutoring is the most solid approach we have. Reviews of elementary reading for struggling readers (Inns et al., 2018) and secondary struggling readers (Baye et al., in press), as well as elementary math (Pellegrini et al., 2018), find outcomes for various forms of tutoring that are far beyond effects seen for any other type of treatment. Everyone knows this, but thinking about tutoring falls into two camps. One, typified by advocates of Reading Recovery, takes the view that tutoring is so effective for struggling first graders that it should be used no matter what the cost. The other, also perhaps thinking about Reading Recovery, rejects this approach because of its cost. Yet recent research on tutoring methods is finding strategies that are cost-effective and feasible. First, studies in both reading (Inns et al., 2018) and math (Pellegrini et al., 2018) find no difference in outcomes between certified teachers and paraprofessionals using structured one-to-one or one-to-small group tutoring models. Second, although one-to-one tutoring is more effective than one-to-small group, one-to-small group is far more cost-effective, as one trained tutor can work with 4 to 6 students at a time. Also, recent studies have found that tutoring can be just as effective in the upper elementary and middle grades as in first grade, so this strategy may have broader applicability than it has in the past. The real challenge for research on tutoring is to develop and evaluate models that increase cost-effectiveness of this clearly effective family of approaches.

The extraordinary advances in the quality and quantity of research in education, led by investments from IES, i3, and the EEF, have raised expectations for research-based reform. However, the modest percentage of recent studies meeting current rigorous standards of evidence has caused disappointment in some quarters. Instead, all findings, whether immediately successful or not, should be seen as crucial information. Some studies identify programs ready for prime time right now, but the whole body of work can and must inform us about areas worthy of expanded investment, as well as areas in need of serious rethinking and redevelopment. The evidence movement, in the form it exists today, is completing its first decade. It’s still early days. There is much more we can learn and do to develop, evaluate, and disseminate effective strategies, especially for students in great need of proven approaches.

References

Baye, A., Lake, C., Inns, A., & Slavin, R. (in press). Effective reading programs for secondary students. Reading Research Quarterly.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2018). Effective programs for struggling readers: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

Pellegrini, M., Inns, A., & Slavin, R. (2018). Effective programs in elementary mathematics: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

 Photo credit: IBM [CC BY-SA 3.0  (https://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

 

Rethinking Technology in Education

Antonine de Saint Exupéry, in his 1931 classic Night Flight, had a wonderful line about early airmail service in Patagonia, South America:

“When you are crossing the Andes and your engine falls out, well, there’s nothing to do but throw in your hand.”

blog_10-4-18_Saint_Exupery_363x500

I had reason to think about this quote recently, as I was attending a conference in Santiago, Chile, the presumed destination of the doomed pilot. The conference focused on evidence-based reform in education.

Three of the papers described large scale, randomized evaluations of technology applications in Latin America, funded by the Inter-American Development Bank (IDB). Two of them documented disappointing outcomes of large-scale, traditional uses of technology. One described a totally different application.

One of the studies, reported by Santiago Cueto (Cristia et al., 2017), randomly assigned 318 high-poverty, mostly rural primary schools in Peru to receive sturdy, low-cost, practical computers, or to serve as a control group. Teachers were given great latitude in how to use the computers, but limited professional development in how to use them as pedagogical resources. Worse, the computers had software with limited alignment to the curriculum, and teachers were expected to overcome this limitation. Few did. Outcomes were essentially zero in reading and math.

In another study (Berlinski & Busso, 2017), the IDB funded a very well-designed study in 85 schools in Costa Rica. Schools were randomly assigned to receive one of five approaches. All used the same content on the same schedule to teach geometry to seventh graders. One group used traditional lectures and questions with no technology. The others used active learning, active learning plus interactive whiteboards, active learning plus a computer lab, or active learning plus one computer per student. “Active learning” emphasized discussions, projects, and practical exercises.

On a paper-and-pencil test covering the content studied by all classes, all four of the experimental groups scored significantly worse than the control group. The lowest performance was seen in the computer lab condition, and, worst of all, the one computer per child condition.

The third study, in Chile (Araya, Arias, Bottan, & Cristia, 2018), was funded by the IDB and the International Development Research Center of the Canadian government. It involved a much more innovative and unusual application of technology. Fourth grade classes within 24 schools were randomly assigned to experimental or control conditions. In the experimental group, classes in similar schools were assigned to serve as competitors to each other. Within the math classes, students studied with each other and individually for a bi-monthly “tournament,” in which students in each class were individually given questions to answer on the computers. Students were taught cheers and brought to fever pitch in their preparations. The participating classes were compared to the control classes, which studied the same content using ordinary methods. All classes, experimental and control, were studying the national curriculum on the same schedule, and all used computers, so all that differed was the tournaments and the cooperative studying to prepare for the tournaments.

The outcomes were frankly astonishing. The students in the experimental schools scored much higher on national tests than controls, with an effect size of +0.30.

The differences in the outcomes of these three approaches are clear. What might explain them, and what do they tell us about applications of technology in Latin America and anywhere?

In Peru, the computers were distributed as planned and generally functioned, but teachers receive little professional development. In fact, teachers were not given specific strategies for using the computers, but were expected to come up with their own uses for them.

The Costa Rica study did provide computer users with specific approaches to math and gave teachers much associated professional development. Yet the computers may have been seen as replacements for teachers, and the computers may just not have been as effective as teachers. Alternatively, despite extensive PD, all four of the experimental approaches were very new to the teachers and may have not been well implemented.

In contrast, in the Chilean study, tournaments and cooperative study were greatly facilitated by the computers, but the computers were not central to program effectiveness. The theory of action emphasized enhanced motivation to engage in cooperative study of math. The computers were only a tool to achieve this goal. The tournament strategy resembles a method from the 1970s called Teams-Games-Tournaments (TGT) (DeVries & Slavin, 1978). TGT was very effective, but was complicated for teachers to use, which is why it was not widely adopted. In Chile, computers helped solve the problems of complexity.

It is important to note that in the United States, technology solutions are also not producing major gains in student achievement. Reviews of research on elementary reading (ES=+0.05; Inns et al. 2018) and secondary reading (ES= -0.01; Baye et al., in press) have reported near-zero effects of technology-assisted effects of technology-assisted approaches. Outcomes in elementary math are only somewhat better, averaging an effect size of +0.09 (Pellegrini et al., 2018).

The findings of these rigorous studies of technology in the U.S. and Latin America lead to a conclusion that there is nothing magic about technology. Applications of technology can work if the underlying approach is sound. Perhaps it is best to consider which non-technology approaches are proven or likely to increase learning, and only then imagine how technology could make effective methods easier, less expensive, more motivating, or more instructionally effective. As an analogy, great audio technology can make a concert more pleasant or audible, but the whole experience still depends on great composition and great performances. Perhaps technology in education should be thought of in a similar enabling way, rather than as the core of innovation.

St. Exupéry’s Patagonian pilots crossing the Andes had no “Plan B” if their engines fell out. We do have many alternative ways to put technology to work or to use other methods, if the computer-assisted instruction strategies that have dominated technology since the 1970s keep showing such small or zero effects. The Chilean study and certain exceptions to the overall pattern of research findings in the U.S. suggest appealing “Plans B.”

The technology “engine” is not quite falling out of the education “airplane.” We need not throw in our hand. Instead, it is clear that we need to re-engineer both, to ask not what is the best way to use technology, but what is the best way to engage, excite, and instruct students, and then ask how technology can contribute.

Photo credit: Distributed by Agence France-Presse (NY Times online) [Public domain], via Wikimedia Commons

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

References

Araya, R., Arias, E., Bottan, N., & Cristia, J. (2018, August 23). Conecta Ideas: Matemáticas con motivatión social. Paper presented at the conference “Educate with Evidence,” Santiago, Chile.

Baye, A., Lake, C., Inns, A., & Slavin, R. (in press). Effective reading programs for secondary students. Reading Research Quarterly.

Berlinski, S., & Busso, M. (2017). Challenges in educational reform: An experiment on active learning in mathematics. Economics Letters, 156, 172-175.

Cristia, J., Ibarraran, P., Cueto, S., Santiago, A., & Severín, E. (2017). Technology and child development: Evidence from the One Laptop per Child program. American Economic Journal: Applied Economics, 9 (3), 295-320.

DeVries, D. L., & Slavin, R. E. (1978). Teams-Games-Tournament:  Review of ten classroom experiments. Journal of Research and Development in Education, 12, 28-38.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2018, March 3). Effective programs for struggling readers: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

Pellegrini, M., Inns, A., & Slavin, R. (2018, March 3). Effective programs in elementary mathematics: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

What’s the Evidence that Evidence Works?

I recently gave a couple of speeches on evidence-based reform in education in Barcelona.  In preparing for them, one of the organizers asked me an interesting question: “What is your evidence that evidence works?”

At one level, this is a trivial question. If schools select proven programs and practices aligned with their needs and implement them with fidelity and intelligence, with levels of resources similar to those used in the original successful research, then of course they’ll work, right? And if a school district adopts proven programs, encourages and funds them, and monitors their implementation and outcomes, then of course the appropriate use of all these programs is sure to enhance achievement district-wide, right?

Although logic suggests that a policy of encouraging and funding proven programs is sure to increase achievement on a broad scale, I like to be held to a higher standard: Evidence. And, it so happens, I happen to have some evidence on this very topic. This evidence came from a large-scale evaluation of an ambitious, national effort to increase use of proven and promising schoolwide programs in elementary and middle schools, in a research center funded by the Institute for Education Sciences (IES) called the Center for Data-Driven Reform in Education, or CDDRE (see Slavin, Cheung, Holmes, Madden, & Chamberlain, 2013). The name of the program the experimental schools used was Raising the Bar.

How Raising the Bar Raised the Bar

The idea behind Raising the Bar was to help schools analyze their own needs and strengths, and then select whole-school reform models likely to help them meet their achievement goals. CDDRE consultants provided about 30 days of on-site professional development to each district over a 2-year period. The PD focused on review of data, effective use of benchmark assessments, school walk-throughs by district leaders to see the degree to which schools were already using the programs they claimed to be using, and then exposing district and school leaders to information and data on schoolwide programs available to them, from several providers. If districts selected a program to implement, their district and school received PD on ensuring effective implementation and principals and teachers received PD on the programs they chose.

blog_7-26-18_polevault_375x500

Evaluating Raising the Bar

In the study of Raising the Bar we recruited a total of 397 elementary and 225 middle schools in 59 districts in 7 states (AL, AZ, IN, MS, OH, TN). All schools were Title I schools in rural and mid-sized urban districts. Overall, 30% of students were African-American, 20% were Hispanic, and 47% were White. Across three cohorts, starting in 2005, 2006, or 2007, schools were randomly assigned to either use Raising the Bar, or to continue with what they were doing. The study ended in 2009, so schools could have been in the Raising the Bar group for two, three, or four years.

Did We Raise the Bar?

State test scores were obtained from all schools and transformed to z-scores so they could be combined across states. The analyses focused on grades 5 and 8, as these were the only grades tested in some states at the time. Hierarchical linear modeling, with schools nested within districts, were used for analysis.

For reading in fifth grade, outcomes were very good. By Year 3, the effect sizes were significant, with significant individual-level effect sizes of +0.10 in Year 3 and +0.19 in Year 4. In middle school reading, effect sizes reached an effect size of +0.10 by Year 4.

Effects were also very good in fifth grade math, with significant effects of +0.10 in Year 3 and +0.13 in Year 4. Effect sizes in middle school math were also significant in Year 4 (ES=+0.12).

Note that these effects are for all schools, whether they adopted a program or not. Non-experimental analyses found that by Year 4, elementary schools that had chosen and implemented a reading program (33% of schools by Year 3, 42% by Year 4) scored better than matched controls in reading. Schools that chose any reading program usually chose our Success for All reading program, but some chose other models. Even in schools that did not adopt reading or math programs, scores were always higher, on average, (though not always significantly higher) than for schools that did not choose programs.

How Much Did We Raise the Bar?

The CDDRE project was exceptional because of its size and scope. The 622 schools, in 59 districts in 7 states, were collectively equivalent to a medium-sized state. So if anyone asks what evidence-based reform could do to help an entire state, this study provides one estimate. The student-level outcome in elementary reading, an effect size of +0.19, applied to NAEP scores, would be enough to move 43 states to the scores now only attained by the top 10. If applied successfully to schools serving mostly African American and Hispanic students or to students receiving free- or reduced-price lunches regardless of ethnicity, it would reduce the achievement gap between these and White or middle-class students by about 38%. All in four years, at very modest cost.

Actually, implementing something like Raising the Bar could be done much more easily and effectively today than it could in 2005-2009. First, there are a lot more proven programs to choose from than there were then. Second, the U.S. Congress, in the Every Student Succeeds Act (ESSA), now has definitions of strong, moderate, and promising levels of evidence, and restricts school improvement grants to schools that choose such programs. The reason only 42% of Raising the Bar schools selected a program is that they had to pay for it, and many could not afford to do so. Today, there are resources to help with this.

The evidence is both logical and clear: Evidence works.

Reference

Slavin, R. E., Cheung, A., Holmes, G., Madden, N. A., & Chamberlain, A. (2013). Effects of a data-driven district reform model on state assessment outcomes. American Educational Research Journal, 50 (2), 371-396.

Photo by Sebastian Mary/Gio JL [CC BY-SA 2.0  (https://creativecommons.org/licenses/by-sa/2.0)], via Wikimedia Commons

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Do Textbooks Matter?

Recently, some colleagues and I were speaking with some superintendents about how they use evidence to select educational programs. Although they had many useful insights, it quickly became clear that when we said programs, they thought we meant textbooks.

But a textbook is not a program.

A program is a set of coordinated strategies designed to improve student achievement. A hallmark of programs is that they almost invariably include a lot of professional development. Textbooks almost invariably do not. A half day inservice is typical of textbooks, while programs generally provide many days of inservice, plus on-site coaching and feedback, on line or in-school discussions, and so on. Programs may also include textbooks or other curriculum or software, but they are focused on changing teachers’ behaviors in the classroom, not just changing content.

Content is important, of course, but changing textbooks almost never changes outcomes on achievement tests. My colleagues and I have published reviews of research on elementary and secondary reading, math, and science. In every one of these reviews, changing textbooks is one category of interventions that has been studied, often in very large, randomized experiments. Yet textbooks never make much of a difference on average, and it is rare that they show significant differences in even a single qualifying study. These studies usually use standardized tests as the outcome measures, and a focus of many textbook innovations is on closer alignment with current standards and assessments. Yet that strategy has been tried and evaluated many times, and it almost never works.

What does work, in contrast, are programs, ones that provide a great deal of professional development on well-defined models of teaching, such as cooperative learning and teaching of metacognitive skills.

Not every study of professional development approaches shows increases in achievement, and there are other factors that underlie more and less effective innovations. But on average, the difference between professional development and textbook approaches is crystal clear, and applies to all subjects and grade levels.

So when your textbooks are worn out, or you are tired of them, go ahead and replace them with a shiny new textbook or digital textbook. It won’t make any difference in students’ learning, but no one wants students to have shabby or outdated material. But when you decide to do something to improve student learning, do not follow your textbook adoption cycle. Instead, find proven programs with outstanding and sufficient professional development. Your kids, parents, and colleagues will be glad you did.

What Makes Educational Technology Programs Work?

2015-01-29-HP59_01_29_15.jpg

While everyone else is having a lot more fun, my colleagues and I sit up late at night writing a free website, the Best Evidence Encyclopedia (www.bestevidence.org), which reviews evaluations of educational programs in reading, math, and science.

The recent reports reinforce an observation I’ve made previously. When programs are found to have little or no impact on student learning, it is often the case that they provide very little professional development to teachers. Giving teachers lots of professional development does not guarantee positive effects, but failing to do so seems to virtually guarantee disappointing impacts.

This observation takes on new importance as technology comes to play an increasing role in educational innovation. Numerous high-quality studies of traditional computer-assisted instruction programs, in which students walk down the hall or to the back of the classroom to work on technology largely disconnected from teachers’ instruction, find few positive effects on learning. Many technology applications appearing in schools today have learned nothing from this sad history and are offering free or low-cost apps that students work on individually, with little professional development for teachers or even any connection to their (non-technology) lessons. In light of the prior research, it would be astonishing if these apps made any difference in student learning, no matter how appealing or well-designed they are.

Alongside the thousands of free apps going into schools, there has also developed an entirely different approach to technology, one that integrates technology with teacher lessons and provides teachers with extensive professional development and coaching. Studies of such programs do find significant positive effects. As one example, I recently saw an evaluation of a reading and math program called Time to Know. In Time to Know, teachers use computers and their own non-computer lessons to start a lesson. Students then do activities on their individual devices, personalized to their needs and learning histories. Student learning is continuously assessed and fed back to the teacher to use in informing further lessons and guiding interventions with individual students.

Time to Know provides teachers with significant professional development and coaching, so they can use it flexibly and effectively. Perhaps as a result, the program showed very good outcomes in a small but high-quality study, with an effect size of +0.32 in reading and +0.29 in math.

There are many other studies of classroom programs that improve student learning, in particular studies of forms of cooperative learning in many subjects and grade levels. As a group, the outcomes reported in these studies are always far higher than those seen in studies of traditional technology applications, in all subjects and grade levels. What is interesting about the study of Time to Know is that here is an unusually positive outcome for a technology application in a rigorous experiment. What is unique about the intervention is that it embeds technology in the classroom and provides teachers with extensive PD. Perhaps classroom-embedded technology with adequate professional development is the wave of the future, and perhaps it will finally achieve the long-awaited breakthroughs that technology has been promising for the past 40 years.