Succeeding Faster in Education

“If you want to increase your success rate, double your failure rate.” So said Thomas Watson, the founder of IBM. What he meant, of course, is that people and organizations thrive when they try many experiments, even though most experiments fail. Failing twice as often means trying twice as many experiments, leading to twice as many failures—but also, he was saying, many more successes.

blog_9-20-18_TJWatson_500x488
Thomas Watson

In education research and innovation circles, many people know this quote, and use it to console colleagues who have done an experiment that did not produce significant positive outcomes. A lot of consolation is necessary, because most high-quality experiments in education do not produce significant positive outcomes. In studies funded by the Institute for Education Sciences (IES), Investing in Innovation (i3), and England’s Education Endowment Foundation (EEF), all of which require very high standards of evidence, fewer than 20% of experiments show significant positive outcomes.

The high rate of failure in educational experiments is often shocking to non-researchers, especially the government agencies, foundations, publishers, and software developers who commission the studies. I was at a conference recently in which a Peruvian researcher presented the devastating results of an experiment in which high-poverty, mostly rural schools in Peru were randomly assigned to receive computers for all of their students, or to continue with usual instruction. The Peruvian Ministry of Education was so confident that the computers would be effective that they had built a huge model of the specific computers used in the experiment and attached it to the Ministry headquarters. When the results showed no positive outcomes (except for the ability to operate computers), the Ministry quietly removed the computer statue from the top of their building.

Improving Success Rates

Much as I believe Watson’s admonition (“fail more”), there is another principle that he was implying, or so I expect: We have to learn from failure, so we can increase the rate of success. It is not realistic to expect government to continue to invest substantial funding in high-quality educational experiments if the success rate remains below 20%. We have to get smarter, so we can succeed more often. Fortunately, qualitative measures, such as observations, interviews, and questionnaires, are becoming required elements of funded research, facilitating finding out what happened so that researchers can find out what went wrong. Was the experimental program faithfully implemented? Were there unexpected responses toward the program by teachers or students?

In the course of my work reviewing positive and disappointing outcomes of educational innovations, I’ve noticed some patterns that often predict that a given program is likely or unlikely to be effective in a well-designed evaluation. Some of these are as follows.

  1. Small changes lead to small (or zero) impacts. In every subject and grade level, researchers have evaluated new textbooks, in comparison to existing texts. These almost never show positive effects. The reason is that textbooks are just not that different from each other. Approaches that do show positive effects are usually markedly different from ordinary practices or texts.
  2. Successful programs almost always provide a lot of professional development. The programs that have significant positive effects on learning are ones that markedly improve pedagogy. Changing teachers’ daily instructional practices usually requires initial training followed by on-site coaching by well-trained and capable coaches. Lots of PD does not guarantee success, but minimal PD virtually guarantees failure. Sufficient professional development can be expensive, but education itself is expensive, and adding a modest amount to per-pupil cost for professional development and other requirements of effective implementation is often the best way to substantially enhance outcomes.
  3. Effective programs are usually well-specified, with clear procedures and materials. Rarely do programs work if they are unclear about what teachers are expected to do, and helped to do it. In the Peruvian study of one-to-one computers, for example, students were given tablet computers at a per-pupil cost of $438. Teachers were expected to figure out how best to use them. In fact, a qualitative study found that the computers were considered so valuable that many teachers locked them up except for specific times when they were to be used. They lacked specific instructional software or professional development to create the needed software. No wonder “it” didn’t work. Other than the physical computers, there was no “it.”
  4. Technology is not magic. Technology can create opportunities for improvement, but there is little understanding of how to use technology to greatest effect. My colleagues and I have done reviews of research on effects of modern technology on learning. We found near-zero effects of a variety of elementary and secondary reading software (Inns et al., 2018; Baye et al., in press), with a mean effect size of +0.05 in elementary reading and +0.00 in secondary. In math, effects were slightly more positive (ES=+0.09), but still quite small, on average (Pellegrini et al., 2018). Some technology approaches had more promise than others, but it is time that we learned from disappointing as well as promising applications. The widespread belief that technology is the future must eventually be right, but at present we have little reason to believe that technology is transformative, and we don’t know which form of technology is most likely to be transformative.
  5. Tutoring is the most solid approach we have. Reviews of elementary reading for struggling readers (Inns et al., 2018) and secondary struggling readers (Baye et al., in press), as well as elementary math (Pellegrini et al., 2018), find outcomes for various forms of tutoring that are far beyond effects seen for any other type of treatment. Everyone knows this, but thinking about tutoring falls into two camps. One, typified by advocates of Reading Recovery, takes the view that tutoring is so effective for struggling first graders that it should be used no matter what the cost. The other, also perhaps thinking about Reading Recovery, rejects this approach because of its cost. Yet recent research on tutoring methods is finding strategies that are cost-effective and feasible. First, studies in both reading (Inns et al., 2018) and math (Pellegrini et al., 2018) find no difference in outcomes between certified teachers and paraprofessionals using structured one-to-one or one-to-small group tutoring models. Second, although one-to-one tutoring is more effective than one-to-small group, one-to-small group is far more cost-effective, as one trained tutor can work with 4 to 6 students at a time. Also, recent studies have found that tutoring can be just as effective in the upper elementary and middle grades as in first grade, so this strategy may have broader applicability than it has in the past. The real challenge for research on tutoring is to develop and evaluate models that increase cost-effectiveness of this clearly effective family of approaches.

The extraordinary advances in the quality and quantity of research in education, led by investments from IES, i3, and the EEF, have raised expectations for research-based reform. However, the modest percentage of recent studies meeting current rigorous standards of evidence has caused disappointment in some quarters. Instead, all findings, whether immediately successful or not, should be seen as crucial information. Some studies identify programs ready for prime time right now, but the whole body of work can and must inform us about areas worthy of expanded investment, as well as areas in need of serious rethinking and redevelopment. The evidence movement, in the form it exists today, is completing its first decade. It’s still early days. There is much more we can learn and do to develop, evaluate, and disseminate effective strategies, especially for students in great need of proven approaches.

References

Baye, A., Lake, C., Inns, A., & Slavin, R. (in press). Effective reading programs for secondary students. Reading Research Quarterly.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2018). Effective programs for struggling readers: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

Pellegrini, M., Inns, A., & Slavin, R. (2018). Effective programs in elementary mathematics: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

 Photo credit: IBM [CC BY-SA 3.0  (https://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

 

Advertisements

Rethinking Technology in Education

Antonine de Saint Exupéry, in his 1931 classic Night Flight, had a wonderful line about early airmail service in Patagonia, South America:

“When you are crossing the Andes and your engine falls out, well, there’s nothing to do but throw in your hand.”

blog_10-4-18_Saint_Exupery_363x500

I had reason to think about this quote recently, as I was attending a conference in Santiago, Chile, the presumed destination of the doomed pilot. The conference focused on evidence-based reform in education.

Three of the papers described large scale, randomized evaluations of technology applications in Latin America, funded by the Inter-American Development Bank (IDB). Two of them documented disappointing outcomes of large-scale, traditional uses of technology. One described a totally different application.

One of the studies, reported by Santiago Cueto (Cristia et al., 2017), randomly assigned 318 high-poverty, mostly rural primary schools in Peru to receive sturdy, low-cost, practical computers, or to serve as a control group. Teachers were given great latitude in how to use the computers, but limited professional development in how to use them as pedagogical resources. Worse, the computers had software with limited alignment to the curriculum, and teachers were expected to overcome this limitation. Few did. Outcomes were essentially zero in reading and math.

In another study (Berlinski & Busso, 2017), the IDB funded a very well-designed study in 85 schools in Costa Rica. Schools were randomly assigned to receive one of five approaches. All used the same content on the same schedule to teach geometry to seventh graders. One group used traditional lectures and questions with no technology. The others used active learning, active learning plus interactive whiteboards, active learning plus a computer lab, or active learning plus one computer per student. “Active learning” emphasized discussions, projects, and practical exercises.

On a paper-and-pencil test covering the content studied by all classes, all four of the experimental groups scored significantly worse than the control group. The lowest performance was seen in the computer lab condition, and, worst of all, the one computer per child condition.

The third study, in Chile (Araya, Arias, Bottan, & Cristia, 2018), was funded by the IDB and the International Development Research Center of the Canadian government. It involved a much more innovative and unusual application of technology. Fourth grade classes within 24 schools were randomly assigned to experimental or control conditions. In the experimental group, classes in similar schools were assigned to serve as competitors to each other. Within the math classes, students studied with each other and individually for a bi-monthly “tournament,” in which students in each class were individually given questions to answer on the computers. Students were taught cheers and brought to fever pitch in their preparations. The participating classes were compared to the control classes, which studied the same content using ordinary methods. All classes, experimental and control, were studying the national curriculum on the same schedule, and all used computers, so all that differed was the tournaments and the cooperative studying to prepare for the tournaments.

The outcomes were frankly astonishing. The students in the experimental schools scored much higher on national tests than controls, with an effect size of +0.30.

The differences in the outcomes of these three approaches are clear. What might explain them, and what do they tell us about applications of technology in Latin America and anywhere?

In Peru, the computers were distributed as planned and generally functioned, but teachers receive little professional development. In fact, teachers were not given specific strategies for using the computers, but were expected to come up with their own uses for them.

The Costa Rica study did provide computer users with specific approaches to math and gave teachers much associated professional development. Yet the computers may have been seen as replacements for teachers, and the computers may just not have been as effective as teachers. Alternatively, despite extensive PD, all four of the experimental approaches were very new to the teachers and may have not been well implemented.

In contrast, in the Chilean study, tournaments and cooperative study were greatly facilitated by the computers, but the computers were not central to program effectiveness. The theory of action emphasized enhanced motivation to engage in cooperative study of math. The computers were only a tool to achieve this goal. The tournament strategy resembles a method from the 1970s called Teams-Games-Tournaments (TGT) (DeVries & Slavin, 1978). TGT was very effective, but was complicated for teachers to use, which is why it was not widely adopted. In Chile, computers helped solve the problems of complexity.

It is important to note that in the United States, technology solutions are also not producing major gains in student achievement. Reviews of research on elementary reading (ES=+0.05; Inns et al. 2018) and secondary reading (ES= -0.01; Baye et al., in press) have reported near-zero effects of technology-assisted effects of technology-assisted approaches. Outcomes in elementary math are only somewhat better, averaging an effect size of +0.09 (Pellegrini et al., 2018).

The findings of these rigorous studies of technology in the U.S. and Latin America lead to a conclusion that there is nothing magic about technology. Applications of technology can work if the underlying approach is sound. Perhaps it is best to consider which non-technology approaches are proven or likely to increase learning, and only then imagine how technology could make effective methods easier, less expensive, more motivating, or more instructionally effective. As an analogy, great audio technology can make a concert more pleasant or audible, but the whole experience still depends on great composition and great performances. Perhaps technology in education should be thought of in a similar enabling way, rather than as the core of innovation.

St. Exupéry’s Patagonian pilots crossing the Andes had no “Plan B” if their engines fell out. We do have many alternative ways to put technology to work or to use other methods, if the computer-assisted instruction strategies that have dominated technology since the 1970s keep showing such small or zero effects. The Chilean study and certain exceptions to the overall pattern of research findings in the U.S. suggest appealing “Plans B.”

The technology “engine” is not quite falling out of the education “airplane.” We need not throw in our hand. Instead, it is clear that we need to re-engineer both, to ask not what is the best way to use technology, but what is the best way to engage, excite, and instruct students, and then ask how technology can contribute.

Photo credit: Distributed by Agence France-Presse (NY Times online) [Public domain], via Wikimedia Commons

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

References

Araya, R., Arias, E., Bottan, N., & Cristia, J. (2018, August 23). Conecta Ideas: Matemáticas con motivatión social. Paper presented at the conference “Educate with Evidence,” Santiago, Chile.

Baye, A., Lake, C., Inns, A., & Slavin, R. (in press). Effective reading programs for secondary students. Reading Research Quarterly.

Berlinski, S., & Busso, M. (2017). Challenges in educational reform: An experiment on active learning in mathematics. Economics Letters, 156, 172-175.

Cristia, J., Ibarraran, P., Cueto, S., Santiago, A., & Severín, E. (2017). Technology and child development: Evidence from the One Laptop per Child program. American Economic Journal: Applied Economics, 9 (3), 295-320.

DeVries, D. L., & Slavin, R. E. (1978). Teams-Games-Tournament:  Review of ten classroom experiments. Journal of Research and Development in Education, 12, 28-38.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2018, March 3). Effective programs for struggling readers: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

Pellegrini, M., Inns, A., & Slavin, R. (2018, March 3). Effective programs in elementary mathematics: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

What’s the Evidence that Evidence Works?

I recently gave a couple of speeches on evidence-based reform in education in Barcelona.  In preparing for them, one of the organizers asked me an interesting question: “What is your evidence that evidence works?”

At one level, this is a trivial question. If schools select proven programs and practices aligned with their needs and implement them with fidelity and intelligence, with levels of resources similar to those used in the original successful research, then of course they’ll work, right? And if a school district adopts proven programs, encourages and funds them, and monitors their implementation and outcomes, then of course the appropriate use of all these programs is sure to enhance achievement district-wide, right?

Although logic suggests that a policy of encouraging and funding proven programs is sure to increase achievement on a broad scale, I like to be held to a higher standard: Evidence. And, it so happens, I happen to have some evidence on this very topic. This evidence came from a large-scale evaluation of an ambitious, national effort to increase use of proven and promising schoolwide programs in elementary and middle schools, in a research center funded by the Institute for Education Sciences (IES) called the Center for Data-Driven Reform in Education, or CDDRE (see Slavin, Cheung, Holmes, Madden, & Chamberlain, 2013). The name of the program the experimental schools used was Raising the Bar.

How Raising the Bar Raised the Bar

The idea behind Raising the Bar was to help schools analyze their own needs and strengths, and then select whole-school reform models likely to help them meet their achievement goals. CDDRE consultants provided about 30 days of on-site professional development to each district over a 2-year period. The PD focused on review of data, effective use of benchmark assessments, school walk-throughs by district leaders to see the degree to which schools were already using the programs they claimed to be using, and then exposing district and school leaders to information and data on schoolwide programs available to them, from several providers. If districts selected a program to implement, their district and school received PD on ensuring effective implementation and principals and teachers received PD on the programs they chose.

blog_7-26-18_polevault_375x500

Evaluating Raising the Bar

In the study of Raising the Bar we recruited a total of 397 elementary and 225 middle schools in 59 districts in 7 states (AL, AZ, IN, MS, OH, TN). All schools were Title I schools in rural and mid-sized urban districts. Overall, 30% of students were African-American, 20% were Hispanic, and 47% were White. Across three cohorts, starting in 2005, 2006, or 2007, schools were randomly assigned to either use Raising the Bar, or to continue with what they were doing. The study ended in 2009, so schools could have been in the Raising the Bar group for two, three, or four years.

Did We Raise the Bar?

State test scores were obtained from all schools and transformed to z-scores so they could be combined across states. The analyses focused on grades 5 and 8, as these were the only grades tested in some states at the time. Hierarchical linear modeling, with schools nested within districts, were used for analysis.

For reading in fifth grade, outcomes were very good. By Year 3, the effect sizes were significant, with significant individual-level effect sizes of +0.10 in Year 3 and +0.19 in Year 4. In middle school reading, effect sizes reached an effect size of +0.10 by Year 4.

Effects were also very good in fifth grade math, with significant effects of +0.10 in Year 3 and +0.13 in Year 4. Effect sizes in middle school math were also significant in Year 4 (ES=+0.12).

Note that these effects are for all schools, whether they adopted a program or not. Non-experimental analyses found that by Year 4, elementary schools that had chosen and implemented a reading program (33% of schools by Year 3, 42% by Year 4) scored better than matched controls in reading. Schools that chose any reading program usually chose our Success for All reading program, but some chose other models. Even in schools that did not adopt reading or math programs, scores were always higher, on average, (though not always significantly higher) than for schools that did not choose programs.

How Much Did We Raise the Bar?

The CDDRE project was exceptional because of its size and scope. The 622 schools, in 59 districts in 7 states, were collectively equivalent to a medium-sized state. So if anyone asks what evidence-based reform could do to help an entire state, this study provides one estimate. The student-level outcome in elementary reading, an effect size of +0.19, applied to NAEP scores, would be enough to move 43 states to the scores now only attained by the top 10. If applied successfully to schools serving mostly African American and Hispanic students or to students receiving free- or reduced-price lunches regardless of ethnicity, it would reduce the achievement gap between these and White or middle-class students by about 38%. All in four years, at very modest cost.

Actually, implementing something like Raising the Bar could be done much more easily and effectively today than it could in 2005-2009. First, there are a lot more proven programs to choose from than there were then. Second, the U.S. Congress, in the Every Student Succeeds Act (ESSA), now has definitions of strong, moderate, and promising levels of evidence, and restricts school improvement grants to schools that choose such programs. The reason only 42% of Raising the Bar schools selected a program is that they had to pay for it, and many could not afford to do so. Today, there are resources to help with this.

The evidence is both logical and clear: Evidence works.

Reference

Slavin, R. E., Cheung, A., Holmes, G., Madden, N. A., & Chamberlain, A. (2013). Effects of a data-driven district reform model on state assessment outcomes. American Educational Research Journal, 50 (2), 371-396.

Photo by Sebastian Mary/Gio JL [CC BY-SA 2.0  (https://creativecommons.org/licenses/by-sa/2.0)], via Wikimedia Commons

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Do Textbooks Matter?

Recently, some colleagues and I were speaking with some superintendents about how they use evidence to select educational programs. Although they had many useful insights, it quickly became clear that when we said programs, they thought we meant textbooks.

But a textbook is not a program.

A program is a set of coordinated strategies designed to improve student achievement. A hallmark of programs is that they almost invariably include a lot of professional development. Textbooks almost invariably do not. A half day inservice is typical of textbooks, while programs generally provide many days of inservice, plus on-site coaching and feedback, on line or in-school discussions, and so on. Programs may also include textbooks or other curriculum or software, but they are focused on changing teachers’ behaviors in the classroom, not just changing content.

Content is important, of course, but changing textbooks almost never changes outcomes on achievement tests. My colleagues and I have published reviews of research on elementary and secondary reading, math, and science. In every one of these reviews, changing textbooks is one category of interventions that has been studied, often in very large, randomized experiments. Yet textbooks never make much of a difference on average, and it is rare that they show significant differences in even a single qualifying study. These studies usually use standardized tests as the outcome measures, and a focus of many textbook innovations is on closer alignment with current standards and assessments. Yet that strategy has been tried and evaluated many times, and it almost never works.

What does work, in contrast, are programs, ones that provide a great deal of professional development on well-defined models of teaching, such as cooperative learning and teaching of metacognitive skills.

Not every study of professional development approaches shows increases in achievement, and there are other factors that underlie more and less effective innovations. But on average, the difference between professional development and textbook approaches is crystal clear, and applies to all subjects and grade levels.

So when your textbooks are worn out, or you are tired of them, go ahead and replace them with a shiny new textbook or digital textbook. It won’t make any difference in students’ learning, but no one wants students to have shabby or outdated material. But when you decide to do something to improve student learning, do not follow your textbook adoption cycle. Instead, find proven programs with outstanding and sufficient professional development. Your kids, parents, and colleagues will be glad you did.

What Makes Educational Technology Programs Work?

2015-01-29-HP59_01_29_15.jpg

While everyone else is having a lot more fun, my colleagues and I sit up late at night writing a free website, the Best Evidence Encyclopedia (www.bestevidence.org), which reviews evaluations of educational programs in reading, math, and science.

The recent reports reinforce an observation I’ve made previously. When programs are found to have little or no impact on student learning, it is often the case that they provide very little professional development to teachers. Giving teachers lots of professional development does not guarantee positive effects, but failing to do so seems to virtually guarantee disappointing impacts.

This observation takes on new importance as technology comes to play an increasing role in educational innovation. Numerous high-quality studies of traditional computer-assisted instruction programs, in which students walk down the hall or to the back of the classroom to work on technology largely disconnected from teachers’ instruction, find few positive effects on learning. Many technology applications appearing in schools today have learned nothing from this sad history and are offering free or low-cost apps that students work on individually, with little professional development for teachers or even any connection to their (non-technology) lessons. In light of the prior research, it would be astonishing if these apps made any difference in student learning, no matter how appealing or well-designed they are.

Alongside the thousands of free apps going into schools, there has also developed an entirely different approach to technology, one that integrates technology with teacher lessons and provides teachers with extensive professional development and coaching. Studies of such programs do find significant positive effects. As one example, I recently saw an evaluation of a reading and math program called Time to Know. In Time to Know, teachers use computers and their own non-computer lessons to start a lesson. Students then do activities on their individual devices, personalized to their needs and learning histories. Student learning is continuously assessed and fed back to the teacher to use in informing further lessons and guiding interventions with individual students.

Time to Know provides teachers with significant professional development and coaching, so they can use it flexibly and effectively. Perhaps as a result, the program showed very good outcomes in a small but high-quality study, with an effect size of +0.32 in reading and +0.29 in math.

There are many other studies of classroom programs that improve student learning, in particular studies of forms of cooperative learning in many subjects and grade levels. As a group, the outcomes reported in these studies are always far higher than those seen in studies of traditional technology applications, in all subjects and grade levels. What is interesting about the study of Time to Know is that here is an unusually positive outcome for a technology application in a rigorous experiment. What is unique about the intervention is that it embeds technology in the classroom and provides teachers with extensive PD. Perhaps classroom-embedded technology with adequate professional development is the wave of the future, and perhaps it will finally achieve the long-awaited breakthroughs that technology has been promising for the past 40 years.

Accountability for the Top 95 Percent

2014-12-11-HP54_12_11_14.jpg

Perhaps the most controversial issue in education policy is test-based accountability. Since the 1980s, most states have had tests in reading and math (at least), and have used average school test scores for purposes ranging from praising or embarrassing school staffs to providing financial incentives or closing down low-scoring schools. Test-based accountability became national with NCLB, which required annual testing from grades 3-8, and prescribed sanctions for low-achieving schools. The Obama administration added to this an emphasis on using student test scores as part of teacher evaluations.

The entire test-based accountability movement has paid little attention to evidence. In fact, in 2011, the National Research Council reviewed research on high-stakes accountability and found few benefits.

There’s nothing wrong with testing students and identifying schools in which students appear to be making good or poor progress in comparison to other schools serving students with similar backgrounds, as long as this is just used as information to identify areas of need. What is damaging about accountability is the use of test scores for draconian consequences, such as firing principals and closing schools. The problem is that terror is just not a very good strategy for professional development. Teachers and principals afraid of punishment are more likely to use questionable strategies to raise their scores—teaching the test, reducing time on non-tested subjects, trying to attract higher-achieving kids or get rid of lower performers, not to mention out-and-out cheating. Neither terror nor the hope of rewards does much to fundamentally improve day to day teaching because the vast majority of teachers are already doing their best. There are bad apples, and they need to be rooted out. But you can’t improve the overall learning of America’s children unless you improve daily teaching practices for the top 95% of teachers, the ones who come to work every day, do their best, care about their kids, and go home dead tired.

Improving outcomes for the students of the top 95% requires top-quality, attractive, engaging professional development to help teachers use proven programs and practices. Because people are more likely to take seriously professional development they’ve chosen, teachers should have choices (as a school or department, primarily) of which proven programs they want to adopt and implement.

The toughest accountability should be reserved for the programs themselves, and the organizations that provide them. Teachers and principals should have confidence that if they do adopt a given program and implement it with fidelity and intelligence, it will work. This is best demonstrated in large experiments in which teachers in many schools use innovative programs, and outcomes are compared with similar schools without the programs. They should know that they’ll get enough training and coaching to see that the program will work.

Offering a broad range of proven programs would give local schools and districts
expanded opportunities to make wise choices for their children. Just as evidence in agriculture informs but does not force choices by farmers, evidence in education should enable school leaders to advance children’s learning in a system of choice, not compulsion.

If schools had choices among many proven programs, in all different subjects (tested as well as untested), the landscape of accountability would change. Instead of threatening teachers and principals, government could provide help for schools to adopt programs they want and need. Offering proven programs provides a means of improving outcomes even in untested areas, such as science, social studies, and foreign language. As time goes on, more and better programs with convincing evaluation evidence would appear, because developers and funders would perceive the need for them.

Moving to a focus on evidence-based reform will not solve all of the contentious issues about accountability, but it could help us focus the reform conversation on how to move forward the top 95% of teachers and schools—the ones who teach 95% of our kids—and how to put accountability in proper proportion.

Preschools and Evidence: A Child Will Lead Us

2014-02-06-HPImage.jpg

These are exciting times for people who care about preschool, for people who care about evidence, and especially for people who care about both. President Obama advocated for expanding high-quality preschool opportunities, Bill de Blasio, the new Mayor of New York City, is proposing new taxes on the wealthy for this purpose, and many states are moving toward universal preschool, or at least considering it. The recently passed Omnibus Budget had $250 million in it for states to add to or improve their preschool programs.

What is refreshing is that after thirty years of agreement among researchers that it’s only high-quality preschools that have long-term positive effects, the phrase “high quality” has become part of the political dialogue. At a minimum, “high quality” means “not just underpaid, poorly educated preschool teachers.” But beyond this, “high quality” is easy to agree on, difficult to define.

This is where evidence comes in. We have good evidence about long-term effects of very high-quality preschool programs compared to no preschool, but identifying exceptionally effective, replicable programs (in comparison to run-of-the-mill preschools) has been harder.

The importance of identifying preschool programs that actually work is being recognized not only in academia, but in the general press as well. In the January 29 New York Times, Daniel Willingham and David Grissmer advocated local and national randomized experiments to find out what works in preschool. On January 30, Nicholas Kristof wrote about rigorous research supporting long-term effects of preschool. Two articles on randomized experiments in education would be a good week for Education Week, much less the New York Times.

With President Obama, John Boehner, and the great majority of Americans favoring expansion of high-quality preschools, this might be an extraordinarily good time for the U.S. Department of Education to sponsor development and evaluation of promising preschool models. At the current rate it will take a long time to get to universal pre-K, so in the meantime let’s learn what works.

The U. S. Department of Education did such a study several years ago called Preschool Curriculum Evaluation Research (PCER), in which various models were compared to ordinary preschool approaches. PCER found that only a few models did better than their control groups, but there was a clear pattern to the ones that did. These were models that provided teachers with extensive professional development and materials with a definite structure designed to build vocabulary, phonemic awareness, early math concepts, and school skills. They were not just early introduction of kindergarten, but focused on play, themes, rhymes, songs, stories, and counting games with specific purposes well understood by teachers.

In a new R & D effort, innovators might be asked to create new, practical models, perhaps based on the PCER findings, and evaluate them in rigorous studies. Within a few years, we’d have many proven approaches to preschool, ones that would justify the optimism being expressed by politicians of all stripes.

Historically, preschool is one of the few areas of educational practice or policy in which politicians and the public consider evidence to have much relevance. Perhaps if we get this one right, they will begin to wonder, if evidence is good for four year olds, why shouldn’t we consult it for the rest of education policy? If evidence is to become important for all of education, perhaps it has to begin with a small child leading us.