“Am I Even Real Anymore?” The Truth About Virtual Learning

“My 8-year-old was sobbing last night because she misses playing with her friends at recess, she misses her teacher, and she is worried that everyone has forgotten her. At one point, she asked me if she was even real anymore.”

This appeared in a letter that ran in the July 25th Baltimore Sun. It was written by Jenny Elliott, a Catonsville mother of elementary students. In it, Ms. Elliott tells how she and her husband have been unable to get their kids to do the virtual learning assignments her kids’ school has assigned.

“(Virtual learning) just doesn’t work for them. I can’t physically force them to stare at their devices and absorb information. We’ve yelled, we’ve begged, we’ve made a game of it…we’ve tried everything we can think of. We failed.”

One of the most poignant parts of Ms. Elliott’s letter is her feeling that everyone knows that virtual learning isn’t working for most kids, but no one wants to say so.

“I am begging someone to speak honestly about virtual learning. I have only the perspective of an elementary school parent, but I have to imagine this negatively impacts children at all levels. The communication coming (from authorities) all over the country…it feels delusional.”

Ms. Elliott expresses enormous guilt. “As a parent, I’ve had to see this every day for the last five months, and every day I feel crushing guilt that I can’t make any of it better.” She expresses gratitude for the efforts of her kids’ teachers, and feels sympathy for them. (“I love you, teachers. I am so sorry this is your reality too.”)

blog_8-6-20_computerbordom_500x338 Ms. Elliott notes that if anyone should be able to make virtual learning work, it should be her family: “…a secure living situation, two parents, food security, access to high-speed internet, access to an internet-powered device, etc.” I might add that Catonsville is in suburban Baltimore County, which has a national reputation for its substantial investments in technology over many years.

Since I have written many blogs about schools’ responses to the Covid school closures, I’ve been expecting a letter like this. Informally, my colleagues and I have been chatting to teachers and parents we know with kids in school. Almost every single one tells a story like Ms. Elliott’s. Highly educated parents, plenty of technology, tech-savvy kids, capable and hard-working teachers, all different ages, it does not seem to matter. Teachers and parents alike refer to motivated and successful students who log on and then pay no attention. The kids are  communicating on a different device with their friends, playing games, reading, whatever. There are kids who are engaged with virtual learning, but very few that we’ve heard about.

Much of the reporting about virtual learning has emphasized the lack of access to the Internet, and districts are spending billions to provide devices and improve access. There is a lot of talk about how school closures are increasing learning gaps because disadvantaged students lack access to the Internet, as though school closures are only a problem for disadvantaged students. But if Ms. Elliott is representative of many parents, and I’m sure she is, the problem is far larger than that of students who lack access to technology.

Everyone involved with schools seems to know this, but they do not want to talk about it. There seems to be a giant, unspoken pall of guilt that keeps the reality of what is happening in virtual learning from being discussed openly. Parents feel guilty because they feel deficient if they are not able to get their kids to respond to virtual learning. Teachers feel guilty because they don’t want to admit that they are not able to get more of their students to pay attention. School administrators want to be perceived to be doing something, anything, to combat the educational effects of extended school closures, so while they do talk about the need to obtain more devices and offer teachers more professional development, they do not like to talk about the kids who do have devices, but don’t do much with them. They promise that things will soon be better, with more devices, more professional development, and better lessons turning the tide. Ms. Elliott is sympathetic, but doubtful. “I appreciate those efforts and wholeheartedly believe the educational system is doing the absolute best they can…but I can’t pretend that the virtual school plans will work for our kids.”

Ms. Elliott states at the beginning of her letter that she has no solution to suggest, but she just wants the truth to be known. I have no sure-fire solutions myself. But I do know one thing. Any workable solutions there may be will have to begin with a firm understanding of what is really happening in schools using virtual learning.

In most of the U.S., opening schools in August or September should be out of the question. The rates of new Covid cases remain far too high, and no amount of social distancing inside schools can be safe for students or staff when there are so many carriers of the disease outside of schools. The only true solution, until cures or vaccines are widely available, is to return to the one thing that has worked throughout the world: mandating universal use of masks, shutting down businesses that put people close to each other, and so on. This is the only thing that saved China and South Korea and Italy and Spain and New York City, and it is the only solution now. The faster we return to what works, the sooner we can fully open schools, and then start the long process of healing the terrible damage being done to our children’s learning.

I am not suggesting giving up on virtual learning. If schools will be closed for a long time, it is all we have. But I am pessimistic about trying to fix the current approach to virtual learning. I think we could use all those computers and educators online to much greater effort by providing online tutoring to individuals and small groups, for example, rather than trying to create a classroom community out of children working from home. Perhaps there are ways other than tutoring to use online instruction effectively, but I do not know them. In any case, we need immediate investment in development and evaluation to find the most effective and cost-effective solutions possible, while we wait for a safe time to open schools. We’ll all get through this, one way or the other, but in order to minimize the negative impact on student learning, let’s start with the truth, and then build and use the evidence of what works.

 This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

After the Pandemic: Can We Welcome Students Back to Better Schools?

I am writing in March, 2020, at what may be the scariest point in the COVID-19 pandemic in the U.S. We are just now beginning to understand the potential catastrophe, and also to begin taking actions most likely to reduce the incidence of the disease.

One of the most important preventive measures is school closure. At this writing, thirty entire states have closed their schools, as have many individual districts, including Los Angeles. It is clear that school closures will go far beyond this, both in the U.S. and elsewhere.

I am not an expert on epidemiology, but I did want to make some observations about how widespread school closure could affect education, and (ever the optimist) how this disaster could provide a basis for major improvements in the long run.

Right now, schools are closing for a few weeks, with an expectation that after spring break, all will be well again, and schools might re-open. From what I read, this is unlikely. The virus will continue to spread until it runs out of vulnerable people. The purpose of school closures is to reduce the rate of transmission. Children themselves tend not to get the disease, for some reason, but they do transmit the disease, mostly at school (and then to adults). Only when there are few new cases to transmit can schools be responsibly re-opened. No one knows for sure, but a recent article in Education Week predicted that schools will probably not re-open this school year (Will, 2020). Kansas is the first state to announce that schools will be closed for the rest of the school year, but others will surely follow.

Will students suffer from school closure? There will be lasting damage if students lose parents, grandparents, and other relatives, of course. Their achievement may take a dip, but a remarkable study reported by Ceci (1991) examined the impact of two or more years of school closures in the Netherlands in World War II, and found an initial loss in IQ scores that quickly rebounded after schools re-opened after the war. From an educational perspective, the long-term impact of closure itself may not be so bad. A colleague, Nancy Karweit (1989), studied achievement in districts with long teacher strikes, and did not find much of a lasting impact.

In fact, there is a way in which wise state and local governments might use an opportunity presented by school closures. If schools closing now stay closed through the end of the school year, that could leave large numbers of teachers and administrators with not much to do (assuming they are not furloughed, which could happen). Imagine that, where feasible, this time were used for school leaders to consider how they could welcome students back to much improved schools, and to blog_3-26_20_teleconference2_500x334provide teachers with (electronic) professional development to implement proven programs. This might involve local, regional, or national conversations focused on what strategies are known to be effective for each of the key objectives of schooling. For example, a national series of conversations could take place on proven strategies for beginning reading, for middle school mathematics, for high school science, and so on. By design, the conversations would be focused not just on opinions, but on rigorous evidence of what works. A focus on improving health and disease prevention would be particularly relevant to the current crisis, along with implementing proven academic solutions.

Particular districts might decide to implement proven programs, and then use school closure to provide time for high-quality professional development on instructional strategies that meet the ESSA evidence standards.

Of course, all of the discussion and professional development would have to be done using electronic communications, for obvious reasons of public health. But might it be possible to make wise use of school closure to improve the outcomes of schooling using professional development in proven strategies? With rapid rollout of existing proven programs and dedicated funding, it certainly seems possible.

States and districts are making a wide variety of decisions about what to do during the time that schools are closed. Many are moving to e-learning, but this may be of little help in areas where many students lack computers or access to the internet at home. In some places, a focus on professional development for next school year may be the best way to make the best of a difficult situation.

There have been many times in the past when disasters have led to lasting improvements in health and education. This could be one of these opportunities, if we seize the moment.

Photo credit: Liam Griesacker

References

Ceci, S. J. (1991). How much does schooling influence general intelligence and its cognitive components? A reassessment of the evidence. Developmental Psychology, 27(5), 703–722. https://doi.org/10.1037/0012-1649.27.5.703

Karweit, N. (1989). Time and learning: A review. In R. E. Slavin (Ed.), School and Classroom Organization. Hillsdale, NJ: Erlbaum.

Will, M. (2020, March 15). School closure for the coronavirus could extend to the end of the school year, some say. Education Week.

 This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

A Powerful Hunger for Evidence-Proven Technology

I recently saw a 1954 video of B. F. Skinner showing off a classroom full of eager students using teaching machines. In it, Skinner gave all the usual reasons that teaching machines were soon going to be far superior to ordinary teaching: They were scientifically made to enable students to experience constant success in small steps. They were adapted to students’ needs, so fast students did not need to wait for their slower classmates, and the slower classmates could have the time to solidify their understanding, rather than being whisked from one half-learned topic to the next, never getting a chance to master anything and therefore sinking into greater and greater failure.

Here it is 65 years later and “teaching machines,” now called computer-assisted instruction, are ubiquitous. But are they effective? Computers are certainly effective at teaching students to use technology, but can they teach the core curriculum of elementary or secondary schools? In a series of reviews in the Best Evidence Encyclopedia (BEE; www.bestevidence.org), my colleagues and I have reviewed research on the impacts of technology-infused methods on reading, mathematics, and science, in elementary and secondary schools. Here is a quick summary of my findings:

Mean Effect Sizes for Technology-Based Programs in Recent Reviews
Review Topic No. of Studies Mean Effect Size
Inns et al., in preparation Elementary Reading 23 +0.09
Inns et al., 2019 Struggling Readers 6 +0.06
Baye et al., 2018 Secondary Reading 23 -0.01
Pellegrini et al., 2019 Elementary Mathematics 14 +0.06

If you prefer “months of learning,” these are all about one month, except for secondary reading, which is zero. A study-weighted average across these reviews is an effect size of +0.05. That’s not nothing, but it’s not much. Nothing at all like what Skinner and countless other theorists and advocates have been promising for the past 65 years. I think that even the most enthusiastic fans of technology use in education are beginning to recognize that while technology may be useful in improving achievement on traditional learning outcomes, it has not yet had a revolutionary impact on learning of reading or mathematics.

How can we boost the impact of technology in education?

Whatever you think the effects of technology-based education might be for typical school outcomes, no one could deny that it would be a good thing if that impact were larger than it is today. How could government, the educational technology industry, researchers in and out of ed tech, and practicing educators work together to make technology applications more effective than they are now?

In order to understand how to proceed, it is important to acknowledge a serious problem in the world of ed tech today. Educational technology is usually developed by commercial companies. Like all commercial companies, they must serve their market. Unfortunately, the market for ed tech products is not terribly interested in the evidence supporting technology-based programs. Instead, they tend to pay attention to sales reps or marketing, or they seek opinions from their friends and colleagues, rather than looking at evidence. Technology decision makers often value attractiveness, ease of use, low cost, and current trends or fads, over evidence (see Morrison, Ross & Cheung, 2019, for documentation of these choice strategies).

Technology providers are not uncaring people, and they want their products to truly improve outcomes for children. However, they know that if they put a lot of money into developing and researching an innovative approach to education that happens to use technology, and their method requires a lot of professional development to produce substantially positive effects, their programs might be considered too expensive, and less expensive products that ask less of teachers and other educators would dominate the sector. These problems resemble those faced by textbook publishers, who similarly may have great ideas to increase the effectiveness of their textbooks or to add components that require professional development. Textbook designers are prisoners of their markets just as technology developers are.

The solution, I would propose, requires interventions by government designed to nudge education markets toward use of evidence. Government (federal, state, and local) has a real interest in improving outcomes of education. So how could government facilitate the use of technology-based approaches that are known to enhance student achievement more than those that exist today?

blog_5-24-18_DistStudents_500x332

How government could promote use of proven technology approaches

Government could lead the revolution in educational technology that market-driven technology developers cannot do on their own. It could do this by emphasizing two main strategies: providing funding to assist technology developers of all kinds (e.g., for-profit, non-profit, or universities), providing encouragement and incentives to motivate schools, districts, and states to use programs proven effective in rigorous research, and funding development, evaluation, and dissemination of proven technology-based programs.

Encouraging and incentivizing use of proven technology-based programs

The most important thing government must do to expand the use of proven technology-based approaches (as well as non-technology approaches) is to build a powerful hunger for them among educators, parents, and the public at large. Yes, I realize that this sounds backward; shouldn’t government sponsor development, research, and dissemination of proven programs first? Yes it should, and I’ll address this topic in a moment. Of course we need proven programs. No one will clamor for an empty box. But today, many proven programs already exist, and the bigger problem is getting them (and many others to come) enthusiastically adopted by schools. In fact, we must eventually get to the point where educational leaders value not only individual programs supported by research, but value research itself. That is, when they start looking for technology-based programs, their first step would be to find out what programs are proven to work, rather than selecting programs in the usual way and only then trying to find evidence to support the choice they have already made.

Government at any level could support such a process, but the most likely leader in this would be the federal government. It could provide incentives to schools that select and implement proven programs, and build off of this multifaceted outreach efforts to build hype around proven approaches and the idea that approaches should be proven.

A good example of what I have in mind was the Comprehensive School Reform (CSR) grants of the late 1990s. Schools that adopted whole-school reform models that met certain requirements could receive grants of up to $50,000 per year for three years. By the end of CSR, about 1000 schools got grants in a competitive process, but CSR programs were used in an estimated 6000 schools nationwide. In other words, the hype generated by the CSR grants process led many schools that never got a grant to find other resources to adopt these whole school programs. I should note that only a few of the adopted programs had evidence of effectiveness; in CSR, the core idea was whole-school reform, not evidence (though some had good evidence of effectiveness). But a process like CSR, with highly visible grants and active support from government, illustrates a process that built a powerful hunger for whole-school reform, which could work just as well, I think, if applied to building a powerful hunger for proven technology-based programs and other proven approaches.

“Wait a minute,” I can hear you saying. “Didn’t the ESSA evidence standards already do this?”

This was indeed the intention of ESSA, which established “strong,” “moderate,” and “promising” levels of evidence (as well as lower categories). ESSA has been a great first step in building interest in evidence. However, the only schools that could obtain additional funding for selecting proven programs were among the lowest-achieving schools in the country, so ordinary Title I schools, not to mention non-Title I schools, were not much affected. CSR gave extra points to high-poverty schools, but a much wider variety of schools could get into that game. There is a big different between creating interest in evidence, which ESSA has definitely done, and creating a powerful hunger for proven programs. ESSA was passed four years ago, and it is only now beginning to build knowledge and enthusiasm among schools.

Building many more proven technology-based programs

Clearly, we need many more proven technology-based programs. In our Evidence for ESSA website (www.evidenceforessa.org), we list 113 reading and mathematics programs that meet any of the three top ESSA standards. Only 28 of these (18 reading, 10 math) have a major technology component. This is a good start, but we need a lot more proven technology-based programs. To get them, government needs to continue its productive Institute for Education Sciences (IES) and Education Innovation Research (EIR) initiatives. For for-profit companies, Small Business Innovation Research (SBIR) plays an important role in early development of technology solutions. However, the pace of development and research focused on practical programs for schools needs to accelerate, and to learn from its own successes and failures to increase the success rate of its investments.

Communicating “what works”

There remains an important need to provide school leaders with easy-to-interpret information on the evidence base for all existing programs schools might select. The What Works Clearinghouse and our Evidence for ESSA website do this most comprehensively, but these and other resources need help to keep up with the rapid expansion of evidence that has appeared in the past 10 years.

Technology-based education can still produce the outcomes Skinner promised in his 1954 video, the ones we have all been eagerly awaiting ever since. However, technology developers and researchers need more help from government to build an eager market not just for technology, but for proven achievement outcomes produced by technology.

References

Baye, A., Lake, C., Inns, A., & Slavin, R. (2019). Effective reading programs for secondary students. Reading Research Quarterly, 54 (2), 133-166.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2019). A synthesis of quantitative research on programs for struggling readers in elementary schools. Available at www.bestevidence.org. Manuscript submitted for publication.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (in preparation). A synthesis of quantitative research on elementary reading. Baltimore, MD: Center for Research and Reform in Education, Johns Hopkins University.

Morrison, J. R., Ross, S.M., & Cheung, A.C.K. (2019). From the market to the classroom: How ed-tech products are procured by school districts interacting with vendors. Educational Technology Research and Development, 67 (2), 389-421.

Pellegrini, M., Inns, A., Lake, C., & Slavin, R. (2019). Effective programs in elementary mathematics: A best-evidence synthesis. Available at www.bestevidence.com. Manuscript submitted for publication.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Achieving Breakthroughs in Education By Transforming Effective But Expensive Approaches to be Affordable at Scale

It’s summer in Baltimore. The temperatures are beastly, the humidity worse. I grew up in Washington, DC, which has the same weather. We had no air conditioning, so summers could be torture. No one could sleep, so we all walked around like zombies, yearning for fall.

Today, however, summers in Baltimore are completely bearable. The reason, of course, is air conditioning. Air conditioning existed when I was a kid, but hardly anyone could afford it.  I think the technology has gradually improved, but there was no scientific or technical breakthrough, as far as I know.  Yet somehow, all but the poorest families can afford air conditioning, so summer in Baltimore can be survived. Families that cannot afford air conditioning need assistance, especially for health reasons, but this number is small.

blog_8-15-19_airconditioning_500x357

The story of air conditioning resembles that of much other technology. What happens is that a solution is devised for a very important problem.  The solution is too expensive for ordinary people to use, so initially, it is used in circumstances that justify the cost.  For example, early automobiles were far too expensive for the general public, but they were used for important applications in which the benefits were particularly obvious, such as delivery trucks and cars for doctors and veterinarians.  Also, wealthy individuals and race car drivers could afford the early autos.  These applications provided experience with the manufacture, use, and repair of automobiles and encouraged investments in infrastructure, paving the way (so to speak) for mass production of cars (such as the Model T) that could be afforded by a much larger portion of the population and economy.  Modest improvements are constantly being made, but the focus is on making the technology less expensive, so it can be more widely used.  In medicine, penicillin was invented in the 1920s, but not until the advent of World War II was it made inexpensive enough for practical use.  It saved millions of lives not because it had been invented, but because the Merck Company was commissioned to find a way to make it practicable (the solution involved growing penicillin on rotting squash).

Innovations in education can work in a similar way.  One obvious example is instructional technology, which existed before the 1970s but is only now becoming universally available, mostly because it is falling in price.  However, what education has rarely done is to create expensive but hugely effective interventions and then figure out how to do them cheaply, without reducing their impact.

Until now.

If you are a regular reader of my blog, you can guess where I am going: Tutoring.  As everyone knows, one-to one tutoring by certified teachers is extremely effective.  No surprise there. As you regulars will also know, rigorous research over the past 20 years has established that tutoring by well-trained, well-supervised teaching assistants using proven methods routinely produces outcomes just as good as tutoring by certified teachers, at half the cost.  Further, one-to-small group tutoring, up to one to four, can be almost as effective as one-to-one tutoring in reading, and equally effective in mathematics (see www.bestevidence.org).

One-to-four tutoring by teaching assistants requires about one-eighth of the cost of one-to-one tutoring by teachers.  The mean outcomes for both types of tutoring are about an effect size of +0.30, but several programs are able to produce effect sizes in excess of +0.50, the national mean difference on NAEP between disadvantaged and middle-class students.  (As a point of comparison, average effects of technology applications with elementary struggling readers average +0.05 in reading, and in math, they average +0.07 for all elementary students.  Urban charter schools average +0.04 in reading, +0.05 in math).

Reducing the cost of tutoring should not be seen as a way for schools to save money.  Instead, it should be seen as a way to provide the benefits of tutoring to much larger numbers of students.  Because of its cost, tutoring has been largely restricted to the primary grades (especially first), to perhaps a semester of service, and to reading, but not math.  If tutoring is much less expensive but equally effective, then tutoring can be extended to older students and to math.  Students who need more than a semester of tutoring, or need “booster shots” to maintain their gains into later grades, should be able to receive the tutoring they need, for as long as they need it.

Tutoring has been how rich and powerful people educated their children since the beginning of time.  Ancient Romans, Greeks, and Egyptians had their children tutored if they could afford it.  The great Russian educational theorist, Lev Vygotsky, never saw the inside of a classroom as a child, because his parents could afford to have him tutored.  As a slave, Frederick Douglass received one-to-one tutoring (secretly and illegally) from his owner’s wife, right here in Baltimore.  When his master found out and forbade his wife to continue, Douglass sought further tutoring from immigrant boys on the docks where he worked, in exchange for his master’s wife’s fresh-cooked bread.  Helen Keller received tutoring from Anne Sullivan.  Tutoring has long been known to be effective.  The only question is, or should be, how do we maximize tutoring’s effectiveness while minimizing its cost, so that all students who need it can receive it?

If air conditioning had been like education, we might have celebrated its invention, but sadly concluded that it would never be affordable by ordinary people.  If penicillin had been like education, it would have remained a scientific curiosity until today, and millions would have died due to the lack of it.  If cars had been like education, only the rich would have them.

Air conditioning for all?  What a cool idea.  Cost-effective tutoring for all who need it?  Wouldn’t that be smart?

Photo credit: U.S. Navy photo by Pat Halton [Public domain]

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Can Computers Teach?

Something’s coming

I don’t know

What it is

But it is

Gonna be great!

-Something’s Coming, West Side Story

For more than 40 years, educational technology has been on the verge of transforming educational outcomes for the better. The song “Something’s Coming,” from West Side Story, captures the feeling. We don’t know how technology is going to solve our problems, but it’s gonna be great!

Technology Counts is an occasional section of Education Week. Usually, it publishes enthusiastic predictions about the wonders around the corner, in line with its many advertisements for technology products of all kinds. So it was a bit of a shock to see the most recent edition, dated April 24. An article entitled, “U.S. Teachers Not Seeing Tech Impact,” by Benjamin Herold, reported a nationally representative survey of 700 teachers. They reported huge purchases of digital devices, software, learning apps, and other technology in the past three years. That’s not news, if you’ve been in schools lately. But if you think technology is doing “a lot” to support classroom innovation, you’re out of step with most of the profession. Only 29% of teachers would agree with you, but 41% say “some,” 26% “a little,” and 4% “none.” Equally modest proportions say that technology has “changed their work as a teacher.” The Technology Counts articles describe most teachers as using technology to help them do what they have always done, rather than to innovate.

There are lots of useful things technology is used for, such as teaching students to use computers, and technology may make some tasks easier for teachers and students. But from their earliest beginnings, everyone hoped that computers would help students learn traditional subjects, such as reading and math. Do they?

blog_5-16-19_kidscomputers_500x333

The answer is, not so much. The table below shows average effect sizes for technology programs in reading and math, using data from four recent rigorous reviews of research. Three of these have been posted at www.bestevidence.org. The fourth, on reading strategies for all students, will be posted in the next few weeks.

Mean Effect Sizes for Applications of Technology in Reading and Mathematics
Number of Studies Mean Effect Size
Elementary Reading 16 +0.09
Elementary Reading – Struggling Readers 6 +0.05
Secondary Reading 23 +0.08
Elementary Mathematics 14 +0.07
Study-Weighted Mean 59 +0.08

An effect size of +0.08, which is the average across the four reviews, is not zero. But it is not much. It is certainly not revolutionary. Also, the effects of technology are not improving over time.

As a point of comparison, average effect sizes for tutoring by teaching assistants have the following effect sizes:

Number of Studies Mean Effect Size
Elementary Reading – Struggling Readers 7 +0.34
Secondary Reading 2 +0.23
Elementary Mathematics 10 +0.27
Study-Weighted Mean 19 +0.29

Tutoring by teaching assistants is more than 3 ½ times as effective as technology. Yet the cost differences between tutoring and technology, especially for effective one-to-small group tutoring by teaching assistants, is not much.

Tutoring is not the only effective alternative to technology. Our reviews have identified many types of programs that are more effective than technology.

A valid argument for continuing with use of technology is that eventually, we are bound to come up with more effective technology strategies. It is certainly worthwhile to keep experimenting. But this argument has been made since the early 1970s, and technology is still not ready for prime time, as least as far as teaching reading and math are concerned. I still believe that technology’s day will come, when strategies to get the best from both teachers and technology will reliably be able to improve learning. Until then, let’s use programs and practices already proven to be effective, as we continue to work to improve the outcomes of technology.

 This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

On Progress

My grandfather (pictured below with my son Ben around 1985) was born in 1900, and grew up in Argentina. The world he lived in as a child had no cars, no airplanes, few cures for common diseases, and inefficient agriculture that bound the great majority of the world to farming. By the time he died, in 1996, think of all the astonishing progress he’d seen in technology, medicine, agriculture, and much else.

blog_5-2-19_ben_359x500
Pictured are Bob Slavin’s grandfather and son, both of whom became American citizens: one born before the invention of airplanes, the other born before the exploration of Mars.

I was born in 1950. The progress in technology, medicine, and agriculture, and many other fields, continues to be extraordinary.

In most of our society and economy, we confidently expect progress. When my father needed a heart valve, his doctor suggested that he wait as long as possible because new, much better heart valves were coming out soon. He could, and did, bet his life on progress, and it paid off.

But now consider education. My grandfather attended school in Argentina, where he was taught in rows by teachers who did most of the talking. My father went to school in New York City, where he was taught in rows by teachers who did most of the talking. I went to school in Washington, DC, where I was taught in rows by teachers who did most of the talking. My children went to school in Baltimore, where they mostly sat at tables, and did use some technology, but still, the teachers did most of the talking.

 

My grandchildren are now headed toward school (the oldest is four). They will use a lot of technology, and will sit at tables more than my own children did. But the basic structure of the classroom is not so different from Argentina, 1906. All who eagerly await the technology revolution are certainly seeing many devices in classroom use. But are these devices improving outcomes on, for example, reading and math? Our reviews of research on all types of approaches used in elementary and secondary schools are not finding strong benefits of technology. Across all subjects and grade levels, the average effect size is similar, ranging from +0.07 (elementary math) to +0.09 (elementary reading). If you like “additional months of learning,” these effects equate to one month in a year. Ok, better than zero, but not the revolution we’ve been waiting for.

There are other approaches much more effective than technology, such as tutoring, forms of cooperative learning, and classroom management strategies. At www.evidenceforessa.org, you can see descriptions and outcomes of more than 100 proven programs. But these are not widely used. Your children or grandchildren, or other children you care about, may go 13 years from kindergarten to 12th grade without ever experiencing a proven program. In our field, progress is slow, and dissemination of proven programs is slower.

Education is the linchpin for our economy and society. Everything else depends on it. In all of the developed world, education is richly funded, yet very, very little of this largesse is invested in innovation, evaluations of innovative methods, or dissemination of proven programs. Other fields have shown how innovation, evaluation, and dissemination of proven strategies can become the engine of progress. There is absolutely nothing inevitable about the slow pace of progress in education. That slow pace is a choice we have made, and keep making, year after year, generation after generation. I hope we will make a different choice in time to benefit my grandchildren, and the children of every family in the world. It could happen, and there are many improvements in educational research and development to celebrate. But how long must it take before the best of educational innovation becomes standard practice?

 This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

How Computers Can Help Do Bad Research

“To err is human. But it takes a computer to really (mess) things up.” – Anonymous

Everyone knows the wonders of technology, but they also know how technology can make things worse. Today, I’m going to let my inner nerd run free (sorry!) and write a bit about how computers can be misused in educational program evaluation.

Actually, there are many problems, all sharing the possibilities for serious bias created when computers are used to collect “big data” on computer-based instruction (note that I am not accusing computers of being biased in favor of their electronic pals!  The problem is that “big data” often contains “big bias.” Computers do not have biases. They do what their operators ask them to do.) (So far).

Here is one common problem.  Evaluators of computer-based instruction almost always have available massive amounts of data indicating how much students used the computers or software. Invariably, some students use the computers a lot more than others do. Some may never even log on.

Using these data, evaluators often identify a sample of students, classes, or schools that met a given criterion of use. They then locate students, classes, or schools not using the computers to serve as a control group, matching on achievement tests and perhaps other factors.

This sounds fair. Why should a study of computer-based instruction have to include in the experimental group students who rarely touched the computers?

The answer is that students who did use the computers an adequate amount of time are not at all the same as students who had the same opportunity but did not use them, even if they all had the same pretests, on average. The reason may be that students who used the computers were more motivated or skilled than other students in ways the pretests do not detect (and therefore cannot control for). Sometimes teachers use computer access as a reward for good work, or as an extension activity, in which case the bias is obvious. Sometimes whole classes or schools use computers more than others do, and this may indicate other positive features about those classes or schools that pretests do not capture.

Sometimes a high frequency of computer use indicates negative factors, in which case evaluations that only include the students who used the computers at least a certain amount of time may show (meaningless) negative effects. Such cases include situations in which computers are used to provide remediation for students who need it, or some students may be taking ‘credit recovery’ classes online to replace classes they have failed.

Evaluations in which students who used computers are compared to students who had opportunities to use computers but did not do so have the greatest potential for bias. However, comparisons of students in schools with access to computers to schools without access to computers can be just as bad, if only the computer-using students in the computer-using schools are included.  To understand this, imagine that in a computer-using school, only half of the students actually use computers as much as the developers recommend. The half that did use the computers cannot be compared to the whole non-computer (control) schools. The reason is that in the control schools, we have to assume that given a chance to use computers, half of their students would also do so and half would not. We just don’t know which particular students would and would not have used the computers.

Another evaluation design particularly susceptible to bias is studies in which, say, schools using any program are matched (based on pretests, demographics, and so on) with other schools that did use the program after outcomes are already known (or knowable). Clearly, such studies allow for the possibility that evaluators will “cherry-pick” their favorite experimental schools and “crabapple-pick” control schools known to have done poorly.

blog_12-13-18_evilcomputer_500x403

Solutions to Problems in Evaluating Computer-based Programs.

Fortunately, there are practical solutions to the problems inherent to evaluating computer-based programs.

Randomly Assigning Schools.

The best solution by far is the one any sophisticated quantitative methodologist would suggest: identify some numbers of schools, or grades within schools, and randomly assign half to receive the computer-based program (the experimental group), and half to a business-as-usual control group. Measure achievement at pre- and post-test, and analyze using HLM or some other multi-level method that takes clusters (schools, in this case) into account. The problem is that this can be expensive, as you’ll usually need a sample of about 50 schools and expert assistance.  Randomized experiments produce “intent to treat” (ITT) estimates of program impacts that include all students whether or not they ever touched a computer. They can also produce non-experimental estimates of “effects of treatment on the treated” (TOT), but these are not accepted as the main outcomes.  Only ITT estimates from randomized studies meet the “strong” standards of ESSA, the What Works Clearinghouse, and Evidence for ESSA.

High-Quality Matched Studies.

It is possible to simulate random assignment by matching schools in advance based on pretests and demographic factors. In order to reach the second level (“moderate”) of ESSA or Evidence for ESSA, a matched study must do everything a randomized study does, including emphasizing ITT estimates, with the exception of randomizing at the start.

In this “moderate” or quasi-experimental category there is one research design that may allow evaluators to do relatively inexpensive, relatively fast evaluations. Imagine that a program developer has sold their program to some number of schools, all about to start the following fall. Assume the evaluators have access to state test data for those and other schools. Before the fall, the evaluators could identify schools not using the program as a matched control group. These schools would have to have similar prior test scores, demographics, and other features.

In order for this design to be free from bias, the developer or evaluator must specify the entire list of experimental and control schools before the program starts. They must agree that this list is the list they will use at posttest to determine outcomes, no matter what happens. The list, and the study design, should be submitted to the Registry of Efficacy and Effectiveness Studies (REES), recently launched by the Society for Research on Educational Effectiveness (SREE). This way there is no chance of cherry-picking or crabapple-picking, as the schools in the analysis are the ones specified in advance.

All students in the selected experimental and control schools in the grades receiving the treatment would be included in the study, producing an ITT estimate. There is not much interest in this design in “big data” on how much individual students used the program, but such data would produce a  “treatment-on-the-treated” (TOT) estimate that should at least provide an upper bound of program impact (i.e., if you don’t find a positive effect even on your TOT estimate, you’re really in trouble).

This design is inexpensive both because existing data are used and because the experimental schools, not the evaluators, pay for the program implementation.

That’s All?

Yup.  That’s all.  These designs do not make use of the “big data “cheaply assembled by designers and evaluators of computer-based programs. Again, the problem is that “big data” leads to “big bias.” Perhaps someone will come up with practical designs that require far fewer schools, faster turn-around times, and creative use of computerized usage data, but I do not see this coming. The problem is that in any kind of experiment, things that take place after random or matched assignment (such as participation or non-participation in the experimental treatment) are considered bias, of interest in after-the-fact TOT analyses but not as headline ITT outcomes.

If evidence-based reform is to take hold we cannot compromise our standards. We must be especially on the alert for bias. The exciting “cost-effective” research designs being talked about these days for evaluations of computer-based programs do not meet this standard.

Succeeding Faster in Education

“If you want to increase your success rate, double your failure rate.” So said Thomas Watson, the founder of IBM. What he meant, of course, is that people and organizations thrive when they try many experiments, even though most experiments fail. Failing twice as often means trying twice as many experiments, leading to twice as many failures—but also, he was saying, many more successes.

blog_9-20-18_TJWatson_500x488
Thomas Watson

In education research and innovation circles, many people know this quote, and use it to console colleagues who have done an experiment that did not produce significant positive outcomes. A lot of consolation is necessary, because most high-quality experiments in education do not produce significant positive outcomes. In studies funded by the Institute for Education Sciences (IES), Investing in Innovation (i3), and England’s Education Endowment Foundation (EEF), all of which require very high standards of evidence, fewer than 20% of experiments show significant positive outcomes.

The high rate of failure in educational experiments is often shocking to non-researchers, especially the government agencies, foundations, publishers, and software developers who commission the studies. I was at a conference recently in which a Peruvian researcher presented the devastating results of an experiment in which high-poverty, mostly rural schools in Peru were randomly assigned to receive computers for all of their students, or to continue with usual instruction. The Peruvian Ministry of Education was so confident that the computers would be effective that they had built a huge model of the specific computers used in the experiment and attached it to the Ministry headquarters. When the results showed no positive outcomes (except for the ability to operate computers), the Ministry quietly removed the computer statue from the top of their building.

Improving Success Rates

Much as I believe Watson’s admonition (“fail more”), there is another principle that he was implying, or so I expect: We have to learn from failure, so we can increase the rate of success. It is not realistic to expect government to continue to invest substantial funding in high-quality educational experiments if the success rate remains below 20%. We have to get smarter, so we can succeed more often. Fortunately, qualitative measures, such as observations, interviews, and questionnaires, are becoming required elements of funded research, facilitating finding out what happened so that researchers can find out what went wrong. Was the experimental program faithfully implemented? Were there unexpected responses toward the program by teachers or students?

In the course of my work reviewing positive and disappointing outcomes of educational innovations, I’ve noticed some patterns that often predict that a given program is likely or unlikely to be effective in a well-designed evaluation. Some of these are as follows.

  1. Small changes lead to small (or zero) impacts. In every subject and grade level, researchers have evaluated new textbooks, in comparison to existing texts. These almost never show positive effects. The reason is that textbooks are just not that different from each other. Approaches that do show positive effects are usually markedly different from ordinary practices or texts.
  2. Successful programs almost always provide a lot of professional development. The programs that have significant positive effects on learning are ones that markedly improve pedagogy. Changing teachers’ daily instructional practices usually requires initial training followed by on-site coaching by well-trained and capable coaches. Lots of PD does not guarantee success, but minimal PD virtually guarantees failure. Sufficient professional development can be expensive, but education itself is expensive, and adding a modest amount to per-pupil cost for professional development and other requirements of effective implementation is often the best way to substantially enhance outcomes.
  3. Effective programs are usually well-specified, with clear procedures and materials. Rarely do programs work if they are unclear about what teachers are expected to do, and helped to do it. In the Peruvian study of one-to-one computers, for example, students were given tablet computers at a per-pupil cost of $438. Teachers were expected to figure out how best to use them. In fact, a qualitative study found that the computers were considered so valuable that many teachers locked them up except for specific times when they were to be used. They lacked specific instructional software or professional development to create the needed software. No wonder “it” didn’t work. Other than the physical computers, there was no “it.”
  4. Technology is not magic. Technology can create opportunities for improvement, but there is little understanding of how to use technology to greatest effect. My colleagues and I have done reviews of research on effects of modern technology on learning. We found near-zero effects of a variety of elementary and secondary reading software (Inns et al., 2018; Baye et al., in press), with a mean effect size of +0.05 in elementary reading and +0.00 in secondary. In math, effects were slightly more positive (ES=+0.09), but still quite small, on average (Pellegrini et al., 2018). Some technology approaches had more promise than others, but it is time that we learned from disappointing as well as promising applications. The widespread belief that technology is the future must eventually be right, but at present we have little reason to believe that technology is transformative, and we don’t know which form of technology is most likely to be transformative.
  5. Tutoring is the most solid approach we have. Reviews of elementary reading for struggling readers (Inns et al., 2018) and secondary struggling readers (Baye et al., in press), as well as elementary math (Pellegrini et al., 2018), find outcomes for various forms of tutoring that are far beyond effects seen for any other type of treatment. Everyone knows this, but thinking about tutoring falls into two camps. One, typified by advocates of Reading Recovery, takes the view that tutoring is so effective for struggling first graders that it should be used no matter what the cost. The other, also perhaps thinking about Reading Recovery, rejects this approach because of its cost. Yet recent research on tutoring methods is finding strategies that are cost-effective and feasible. First, studies in both reading (Inns et al., 2018) and math (Pellegrini et al., 2018) find no difference in outcomes between certified teachers and paraprofessionals using structured one-to-one or one-to-small group tutoring models. Second, although one-to-one tutoring is more effective than one-to-small group, one-to-small group is far more cost-effective, as one trained tutor can work with 4 to 6 students at a time. Also, recent studies have found that tutoring can be just as effective in the upper elementary and middle grades as in first grade, so this strategy may have broader applicability than it has in the past. The real challenge for research on tutoring is to develop and evaluate models that increase cost-effectiveness of this clearly effective family of approaches.

The extraordinary advances in the quality and quantity of research in education, led by investments from IES, i3, and the EEF, have raised expectations for research-based reform. However, the modest percentage of recent studies meeting current rigorous standards of evidence has caused disappointment in some quarters. Instead, all findings, whether immediately successful or not, should be seen as crucial information. Some studies identify programs ready for prime time right now, but the whole body of work can and must inform us about areas worthy of expanded investment, as well as areas in need of serious rethinking and redevelopment. The evidence movement, in the form it exists today, is completing its first decade. It’s still early days. There is much more we can learn and do to develop, evaluate, and disseminate effective strategies, especially for students in great need of proven approaches.

References

Baye, A., Lake, C., Inns, A., & Slavin, R. (in press). Effective reading programs for secondary students. Reading Research Quarterly.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2018). Effective programs for struggling readers: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

Pellegrini, M., Inns, A., & Slavin, R. (2018). Effective programs in elementary mathematics: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

 Photo credit: IBM [CC BY-SA 3.0  (https://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

 

Rethinking Technology in Education

Antonine de Saint Exupéry, in his 1931 classic Night Flight, had a wonderful line about early airmail service in Patagonia, South America:

“When you are crossing the Andes and your engine falls out, well, there’s nothing to do but throw in your hand.”

blog_10-4-18_Saint_Exupery_363x500

I had reason to think about this quote recently, as I was attending a conference in Santiago, Chile, the presumed destination of the doomed pilot. The conference focused on evidence-based reform in education.

Three of the papers described large scale, randomized evaluations of technology applications in Latin America, funded by the Inter-American Development Bank (IDB). Two of them documented disappointing outcomes of large-scale, traditional uses of technology. One described a totally different application.

One of the studies, reported by Santiago Cueto (Cristia et al., 2017), randomly assigned 318 high-poverty, mostly rural primary schools in Peru to receive sturdy, low-cost, practical computers, or to serve as a control group. Teachers were given great latitude in how to use the computers, but limited professional development in how to use them as pedagogical resources. Worse, the computers had software with limited alignment to the curriculum, and teachers were expected to overcome this limitation. Few did. Outcomes were essentially zero in reading and math.

In another study (Berlinski & Busso, 2017), the IDB funded a very well-designed study in 85 schools in Costa Rica. Schools were randomly assigned to receive one of five approaches. All used the same content on the same schedule to teach geometry to seventh graders. One group used traditional lectures and questions with no technology. The others used active learning, active learning plus interactive whiteboards, active learning plus a computer lab, or active learning plus one computer per student. “Active learning” emphasized discussions, projects, and practical exercises.

On a paper-and-pencil test covering the content studied by all classes, all four of the experimental groups scored significantly worse than the control group. The lowest performance was seen in the computer lab condition, and, worst of all, the one computer per child condition.

The third study, in Chile (Araya, Arias, Bottan, & Cristia, 2018), was funded by the IDB and the International Development Research Center of the Canadian government. It involved a much more innovative and unusual application of technology. Fourth grade classes within 24 schools were randomly assigned to experimental or control conditions. In the experimental group, classes in similar schools were assigned to serve as competitors to each other. Within the math classes, students studied with each other and individually for a bi-monthly “tournament,” in which students in each class were individually given questions to answer on the computers. Students were taught cheers and brought to fever pitch in their preparations. The participating classes were compared to the control classes, which studied the same content using ordinary methods. All classes, experimental and control, were studying the national curriculum on the same schedule, and all used computers, so all that differed was the tournaments and the cooperative studying to prepare for the tournaments.

The outcomes were frankly astonishing. The students in the experimental schools scored much higher on national tests than controls, with an effect size of +0.30.

The differences in the outcomes of these three approaches are clear. What might explain them, and what do they tell us about applications of technology in Latin America and anywhere?

In Peru, the computers were distributed as planned and generally functioned, but teachers receive little professional development. In fact, teachers were not given specific strategies for using the computers, but were expected to come up with their own uses for them.

The Costa Rica study did provide computer users with specific approaches to math and gave teachers much associated professional development. Yet the computers may have been seen as replacements for teachers, and the computers may just not have been as effective as teachers. Alternatively, despite extensive PD, all four of the experimental approaches were very new to the teachers and may have not been well implemented.

In contrast, in the Chilean study, tournaments and cooperative study were greatly facilitated by the computers, but the computers were not central to program effectiveness. The theory of action emphasized enhanced motivation to engage in cooperative study of math. The computers were only a tool to achieve this goal. The tournament strategy resembles a method from the 1970s called Teams-Games-Tournaments (TGT) (DeVries & Slavin, 1978). TGT was very effective, but was complicated for teachers to use, which is why it was not widely adopted. In Chile, computers helped solve the problems of complexity.

It is important to note that in the United States, technology solutions are also not producing major gains in student achievement. Reviews of research on elementary reading (ES=+0.05; Inns et al. 2018) and secondary reading (ES= -0.01; Baye et al., in press) have reported near-zero effects of technology-assisted effects of technology-assisted approaches. Outcomes in elementary math are only somewhat better, averaging an effect size of +0.09 (Pellegrini et al., 2018).

The findings of these rigorous studies of technology in the U.S. and Latin America lead to a conclusion that there is nothing magic about technology. Applications of technology can work if the underlying approach is sound. Perhaps it is best to consider which non-technology approaches are proven or likely to increase learning, and only then imagine how technology could make effective methods easier, less expensive, more motivating, or more instructionally effective. As an analogy, great audio technology can make a concert more pleasant or audible, but the whole experience still depends on great composition and great performances. Perhaps technology in education should be thought of in a similar enabling way, rather than as the core of innovation.

St. Exupéry’s Patagonian pilots crossing the Andes had no “Plan B” if their engines fell out. We do have many alternative ways to put technology to work or to use other methods, if the computer-assisted instruction strategies that have dominated technology since the 1970s keep showing such small or zero effects. The Chilean study and certain exceptions to the overall pattern of research findings in the U.S. suggest appealing “Plans B.”

The technology “engine” is not quite falling out of the education “airplane.” We need not throw in our hand. Instead, it is clear that we need to re-engineer both, to ask not what is the best way to use technology, but what is the best way to engage, excite, and instruct students, and then ask how technology can contribute.

Photo credit: Distributed by Agence France-Presse (NY Times online) [Public domain], via Wikimedia Commons

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

References

Araya, R., Arias, E., Bottan, N., & Cristia, J. (2018, August 23). Conecta Ideas: Matemáticas con motivatión social. Paper presented at the conference “Educate with Evidence,” Santiago, Chile.

Baye, A., Lake, C., Inns, A., & Slavin, R. (in press). Effective reading programs for secondary students. Reading Research Quarterly.

Berlinski, S., & Busso, M. (2017). Challenges in educational reform: An experiment on active learning in mathematics. Economics Letters, 156, 172-175.

Cristia, J., Ibarraran, P., Cueto, S., Santiago, A., & Severín, E. (2017). Technology and child development: Evidence from the One Laptop per Child program. American Economic Journal: Applied Economics, 9 (3), 295-320.

DeVries, D. L., & Slavin, R. E. (1978). Teams-Games-Tournament:  Review of ten classroom experiments. Journal of Research and Development in Education, 12, 28-38.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2018, March 3). Effective programs for struggling readers: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

Pellegrini, M., Inns, A., & Slavin, R. (2018, March 3). Effective programs in elementary mathematics: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

Pilot Studies: On the Path to Solid Evidence

This week, the Education Technology Industry Network (ETIN), a division of the Software & Information Industry Association (SIIA), released an updated guide to research methods, authored by a team at Empirical Education Inc. The guide is primarily intended to help software companies understand what is required for studies to meet current standards of evidence.

In government and among methodologists and well-funded researchers, there is general agreement about the kind of evidence needed to establish the effectiveness of an education program intended for broad dissemination. To meet its top rating (“meets standards without reservations”) the What Works Clearinghouse (WWC) requires an experiment in which schools, classes, or students are assigned at random to experimental or control groups, and it has a second category (“meets standards with reservations”) for matched studies.

These WWC categories more or less correspond to the Every Student Succeeds Act (ESSA) evidence standards (“strong” and “moderate” evidence of effectiveness, respectively), and ESSA adds a third category, “promising,” for correlational studies.

Our own Evidence for ESSA website follows the ESSA guidelines, of course. The SIIA guidelines explain all of this.

Despite the overall consensus about the top levels of evidence, the problem is that doing studies that meet these requirements is expensive and time-consuming. Software developers, especially small ones with limited capital, often do not have the resources or the patience to do such studies. Any organization that has developed something new may not want to invest substantial resources into large-scale evaluations until they have some indication that the program is likely to show well in a larger, longer, and better-designed evaluation. There is a path to high-quality evaluations, starting with pilot studies.

The SIIA Guide usefully discusses this problem, but I want to add some further thoughts on what to do when you can’t afford a large randomized study.

1. Design useful pilot studies. Evaluators need to make a clear distinction between full-scale evaluations, intended to meet WWC or ESSA standards, and pilot studies (the SIIA Guidelines call these “formative studies”), which are just meant for internal use, both to assess the strengths or weaknesses of the program and to give an early indicator of whether or not a program is ready for full-scale evaluation. The pilot study should be a miniature version of the large study. But whatever its findings, it should not be used in publicity. Results of pilot studies are important, but by definition a pilot study is not ready for prime time.

An early pilot study may be just a qualitative study, in which developers and others might observe classes, interview teachers, and examine computer-generated data on a limited scale. The problem in pilot studies is at the next level, when developers want an early indication of effects on achievement, but are not ready for a study likely to meet WWC or ESSA standards.

2. Worry about bias, not power. Small, inexpensive studies pose two types of problems. One is the possibility of bias, discussed in the next section. The other is lack of power, mostly meaning having a large enough sample to determine that a potentially meaningful program impact is statistically significant, or unlikely to have happened by chance. To understand this, imagine that your favorite baseball team adopts a new strategy. After the first ten games, the team is doing better than it did last year, in comparison to other teams, but this could have happened by chance. After 100 games? Now the results are getting interesting. If 10 teams all adopt the strategy next year and they all see improvements on average? Now you’re headed toward proof.

During the pilot process, evaluators might compare multiple classes or multiple schools, perhaps assigned at random to experimental and control groups. There may not be enough classes or schools for statistical significance yet, but if the mini-study avoids bias, the results will at least be in the ballpark (so to speak).

3. Avoid bias. A small experiment can be fine as a pilot study, but every effort should be made to avoid bias. Otherwise, the pilot study will give a result far more positive than the full-scale study will, defeating the purpose of doing a pilot.

Examples of common sources of biases in smaller studies are as follows.

a. Use of measures made by developers or researchers. These measures typically produce greatly inflated impacts.

b. Implementation of gold-plated versions of the program. . In small pilot studies, evaluations often implement versions of the program that could never be replicated. Examples include providing additional staff time that could not be repeated at scale.

c. Inclusion of highly motivated teachers or students in the experimental group, which gets the program, but not the control group. For example, matched studies of technology often exclude teachers who did not implement “enough” of the program. The problem is that the full-scale experiment (and real life) include all kinds of teachers, so excluding teachers who could not or did not want to engage with technology overstates the likely impact at scale in ordinary schools. Even worse, excluding students who did not use the technology enough may bias the study toward more capable students.

d. Learn from pilots. Evaluators, developers, and disseminators should learn as much as possible from pilots. Observations, interviews, focus groups, and other informal means should be used to understand what is working and what is not, so when the program is evaluated at scale, it is at its best.

 

***

As evidence becomes more and more important, publishers and software developers will increasingly be called upon to prove that their products are effective. However, no program should have its first evaluation be a 50-school randomized experiment. Such studies are indeed the “gold standard,” but jumping from a two-class pilot to a 50-school experiment is a way to guarantee failure. Software developers and publishers should follow a path that leads to a top-tier evaluation, and learn along the way how to ensure that their programs and evaluations will produce positive outcomes for students at the end of the process.

 

This blog is sponsored by the Laura and John Arnold Foundation