Pilot Studies: On the Path to Solid Evidence

This week, the Education Technology Industry Network (ETIN), a division of the Software & Information Industry Association (SIIA), released an updated guide to research methods, authored by a team at Empirical Education Inc. The guide is primarily intended to help software companies understand what is required for studies to meet current standards of evidence.

In government and among methodologists and well-funded researchers, there is general agreement about the kind of evidence needed to establish the effectiveness of an education program intended for broad dissemination. To meet its top rating (“meets standards without reservations”) the What Works Clearinghouse (WWC) requires an experiment in which schools, classes, or students are assigned at random to experimental or control groups, and it has a second category (“meets standards with reservations”) for matched studies.

These WWC categories more or less correspond to the Every Student Succeeds Act (ESSA) evidence standards (“strong” and “moderate” evidence of effectiveness, respectively), and ESSA adds a third category, “promising,” for correlational studies.

Our own Evidence for ESSA website follows the ESSA guidelines, of course. The SIIA guidelines explain all of this.

Despite the overall consensus about the top levels of evidence, the problem is that doing studies that meet these requirements is expensive and time-consuming. Software developers, especially small ones with limited capital, often do not have the resources or the patience to do such studies. Any organization that has developed something new may not want to invest substantial resources into large-scale evaluations until they have some indication that the program is likely to show well in a larger, longer, and better-designed evaluation. There is a path to high-quality evaluations, starting with pilot studies.

The SIIA Guide usefully discusses this problem, but I want to add some further thoughts on what to do when you can’t afford a large randomized study.

1. Design useful pilot studies. Evaluators need to make a clear distinction between full-scale evaluations, intended to meet WWC or ESSA standards, and pilot studies (the SIIA Guidelines call these “formative studies”), which are just meant for internal use, both to assess the strengths or weaknesses of the program and to give an early indicator of whether or not a program is ready for full-scale evaluation. The pilot study should be a miniature version of the large study. But whatever its findings, it should not be used in publicity. Results of pilot studies are important, but by definition a pilot study is not ready for prime time.

An early pilot study may be just a qualitative study, in which developers and others might observe classes, interview teachers, and examine computer-generated data on a limited scale. The problem in pilot studies is at the next level, when developers want an early indication of effects on achievement, but are not ready for a study likely to meet WWC or ESSA standards.

2. Worry about bias, not power. Small, inexpensive studies pose two types of problems. One is the possibility of bias, discussed in the next section. The other is lack of power, mostly meaning having a large enough sample to determine that a potentially meaningful program impact is statistically significant, or unlikely to have happened by chance. To understand this, imagine that your favorite baseball team adopts a new strategy. After the first ten games, the team is doing better than it did last year, in comparison to other teams, but this could have happened by chance. After 100 games? Now the results are getting interesting. If 10 teams all adopt the strategy next year and they all see improvements on average? Now you’re headed toward proof.

During the pilot process, evaluators might compare multiple classes or multiple schools, perhaps assigned at random to experimental and control groups. There may not be enough classes or schools for statistical significance yet, but if the mini-study avoids bias, the results will at least be in the ballpark (so to speak).

3. Avoid bias. A small experiment can be fine as a pilot study, but every effort should be made to avoid bias. Otherwise, the pilot study will give a result far more positive than the full-scale study will, defeating the purpose of doing a pilot.

Examples of common sources of biases in smaller studies are as follows.

a. Use of measures made by developers or researchers. These measures typically produce greatly inflated impacts.

b. Implementation of gold-plated versions of the program. . In small pilot studies, evaluations often implement versions of the program that could never be replicated. Examples include providing additional staff time that could not be repeated at scale.

c. Inclusion of highly motivated teachers or students in the experimental group, which gets the program, but not the control group. For example, matched studies of technology often exclude teachers who did not implement “enough” of the program. The problem is that the full-scale experiment (and real life) include all kinds of teachers, so excluding teachers who could not or did not want to engage with technology overstates the likely impact at scale in ordinary schools. Even worse, excluding students who did not use the technology enough may bias the study toward more capable students.

d. Learn from pilots. Evaluators, developers, and disseminators should learn as much as possible from pilots. Observations, interviews, focus groups, and other informal means should be used to understand what is working and what is not, so when the program is evaluated at scale, it is at its best.

 

***

As evidence becomes more and more important, publishers and software developers will increasingly be called upon to prove that their products are effective. However, no program should have its first evaluation be a 50-school randomized experiment. Such studies are indeed the “gold standard,” but jumping from a two-class pilot to a 50-school experiment is a way to guarantee failure. Software developers and publishers should follow a path that leads to a top-tier evaluation, and learn along the way how to ensure that their programs and evaluations will produce positive outcomes for students at the end of the process.

 

This blog is sponsored by the Laura and John Arnold Foundation

Advertisements

Publishers and Evidence

High above the Avenue of the Americas on prime real estate in midtown Manhattan towers the 51-story McGraw-Hill building. When in New York, I always find time to go look at that building and reflect on the quixotic quest I and my colleagues are on to get educational decision makers to choose rigorously evaluated, proven programs. These programs are often made by small non-profit organizations and universities, like the ones I work in. Looking up at that mighty building in New York, I always wonder, are we fooling ourselves? Who are we to take on some of the most powerful companies in the world?

Education publishing is dominated by three giant, multi-billion dollar publishers, plus another three or four even bigger technology companies. These behemoths are not worried about us, not one bit. Instead, they are worried about each other.

From my experience, there are very good people who work in publishing and technology companies, people who genuinely hope that their products will improve learning for students. They would love to create innovative programs, evaluate them rigorously, and disseminate those found to be effective. However, big as they are, the major publishers face severe constraints in offering proven programs. Because they are in ferocious competition with each other, publishers cannot easily invest in expensive development and evaluation, or insist on extensive professional development, a crucial element of virtually all programs that have been shown to improve student achievement. Doing so would raise their costs, making them vulnerable to lower-cost competitors.

In recent years, many big publishers and technology companies have begun to commission third-party evaluations of their major textbooks, software, and other products. If the evaluations show positive outcomes, they can use this information in their marketing, and having rigorous evidence showing positive impacts helps protect them from the possibility that government might begin to favor programs, software, or other products with proven outcomes in rigorous research. This is exactly what did happen with the enactment of the ESSA evidence standards, though the impact of these standards has not yet been strongly felt.

However, publishers and technology companies cannot get too far out ahead of their market. If superintendents, central office leaders, and others who select textbooks and technology get on board the evidence train, then publishers will greatly expand their efforts in research and development. If the market continues to place little value on evidence, so will the big publishers.

In contrast to commercial publishers and technology companies, non-profit organizations play a disproportionate role in the evidence movement. They are often funded by government or philanthropies to create and evaluate innovations, as big commercial companies almost never are. Non-profits have the freedom to experiment, and to disseminate what works. However, non-profits, universities, and tiny for-profit start-ups are small, under-capitalized, and have little capacity or experience in marketing. Their main, and perhaps only, competitive advantage is that they have evidence of effectiveness. If no one cares about evidence, our programs will not last long.

One problem publishers face is that evaluations of traditional textbooks usually do not show any achievement benefits compared to control groups. The reason is that one publisher’s textbook is just not that different from another’s, which is what the control group is using. Publishers rarely provide much professional development, which makes it difficult for them to introduce anything truly innovative. The half-day August in-service that comes with most textbooks is barely enough to get teachers familiar with the features in the most traditional book. The same is true of technology approaches, which also rarely make much difference in student outcomes, perhaps because they typically provide little professional development beyond what is necessary to run the software.

The strategy emphasized by government and philanthropy for many years has been to fund innovators to create and evaluate programs. Those that succeed are then encouraged or funded to “scale up” their proven programs. Some are able to grow to impressive scale, but never so much as to worry big companies. An occasional David can surprise an occasional Goliath, but in the long run, the big guys win, and they’ll keep winning until someone changes the rules. To oversimplify a bit, what we have are massive publishers and technology companies with few proven innovations, and small non-profits with proven programs but little money or marketing expertise. This is not a recipe for progress.

The solution lays with government. National, state, and/or local governments have to adopt policies that favor the use of programs and software that have been proven in rigorous experiments to be effective in improving student achievement. At the federal level, the ESSA evidence standards are showing the way, and if they truly catch hold, this may be enough. But imagine if a few large states or even big districts started announcing that they were henceforth going to require evidence of effectiveness when they adopt programs and software. The effect could be electric.

For non-profits, such policies could greatly expand access to schools, and perhaps to funding. But most non-profits are so small that it would take them years to scale up substantially while maintaining quality and effectiveness.

For publishers and technology companies, the effect could be even more dramatic. If effectiveness begins to matter, even if just in a few key places, then it becomes worthwhile for them to create, partner with, or acquire effective innovations that provide sufficient professional development. In states and districts with pro-evidence policies, publishers would not have to worry about being undercut by competitors, because all vendors would have to meet evidence standards.

Publishers have tried to acquire proven programs in the past, but this usually comes to smash, because they tend to strip out the professional development and other elements that made the program work in the first place. However, in a pro-evidence environment, publishers would be motivated to maintain the quality, effectiveness, and “brand” of any programs they acquire.

In medicine, most research on practical medications is funded by drug companies and carefully monitored and certified by government. Could such a thing happen in education?

Publishers and technology companies have the capital and expertise to take effective programs to scale. Partnering with creators of proven programs, or creating and evaluating their own, big companies can make a real difference, as long as government ensures that the programs they are disseminating are in fact of the same quality and effectiveness as the versions that were found to be effective.

Publishers and technology companies are a key part of the education landscape. They need to be welcomed into evidence-based reform, and incentivized to engage in innovation and evaluation. Otherwise, educational innovation will remain a marginal activity, benefitting thousands of students when millions are in need.

This blog is sponsored by the Laura and John Arnold Foundation

Farewell to the Walking Encyclopedia

Like just about everyone these days, I carry a digital device in my pocket at all times. At the office, I have a powerful desktop, and in the evening, I curl up with my iPad. Each of these contains the knowledge and wisdom of the ages. Kids and parents have as much access as I do.

The ubiquity of knowledge due to digital devices has led many educational theorists and practitioners to wonder whether teachers are even necessary anymore. Can’t everyone just look things up, do calculations, and generally provide themselves with just-in-time wisdom-on-the-spot?

Unfortunately, the truth is that digital devices are not yet transforming education. But what they are doing is putting the last nail in the coffin of the teacher as walking encyclopedia.

In the old days, a teacher could contribute a lot just by knowing more than the students. Teaching was composed of content knowledge (what the teacher knows and can transmit) and pedagogy (how the teacher manages classrooms, motivates students, makes complex ideas clear, and teaches learning-to-learn skills). Content knowledge is still crucial, but a “walking encyclopedia” is of declining value when everyone can find out everything all the time.

Does the decline of the walking encyclopedia diminish the role of the teacher? Just the opposite. When kids are immersed in too much information, what they need is a guide to help them learn how to comprehend complex texts and understand and organize information. They need to know how to write, how to solve complex problems, how to set up and carry out experiments, how to work well with others, how to contextualize their own thoughts to reason productively, how to manage their own behavior, how to maintain positive motivation, and how to be productive even in the face of difficulties. Each of these objectives, and many more, are at the heart of effective pedagogy. All are aided by content knowledge, of course, but a teacher who knows a lot about his or her discipline but not much about managing and motivating students is not going to succeed in today’s world.

It is my experience that the teaching innovations most likely to enhance student learning are hardly ever those that provide new, improved textbooks or digital content. Instead, they almost invariably provide extensive professional development to teachers, followed up by in-school coaching. In each case, the professional development and coaching focuses on pedagogy, not content. We’ve found the same pattern in all subjects and grade levels.

The task ahead of us in evidence-based education, I believe, is to use evidence of what works in pedagogy to help teachers grow as motivating, engaging, self-aware learning guides, capable of using general and subject-specific pedagogies effectively to help students become eager and capable learners. My encyclopedia walks with me in my pocket wherever I go. That’s true of students, too. They don’t need another at the front of their class. What they do need is someone who can make them care about, comprehend, organize, synthesize, and communicate the megabytes of information they carry.

Why Leave Learning to Chance?

Every year about four million kindergartners enter America’s schools. They’re all excited, eager and confident, because that’s the nature of kindergartners, but unfortunately, we adults know better. We know that among those wonderful five year olds, 65% will reach fourth grade reading below the “proficient” level on the National Assessment of Educational Progress (NAEP), and 31% will not even reach the “basic” level. We know which students in which neighborhoods are most likely to have these problems. Since 1980, the story has hardly changed.

Today, I’m writing this blog from an airplane flying from Baltimore to San Francisco. Flying was a risky business long ago, but today the chances are infinitesimal that my airplane will crash.

So here’s a question. Why is it ok to leave the reading success of children to chance? Why don’t we treat reading success the way we treat air safety, as something to ensure no matter what?

If you think we don’t yet know how to ensure the reading success of all children, you might be right, but I can tell you that we absolutely do know how to ensure a much higher level of success than we have now, with today’s teachers and today’s schools. I was recently reviewing research evaluating reading programs, and I found more than 60 different programs with moderate to strong evidence of effectiveness: one-to-one and one-to-small group tutoring, classroom methods, school-wide reforms, and technology. Over time, it’s certain that these approaches, and combinations of them, could become more and more effective, and we could approach 100% success.

Getting to 100% will require more than just better instruction. We are doing a study in high-poverty schools in Baltimore and found that while at least 21% of second and third graders need glasses, only 6% have them. I’m sure there are similar stories relating to hearing, dental, health, and mental health. Absenteeism is another blocker, and there are more. If we want to get to 100%, we have to deal with all of these.

Well sure, you might say, but how could we afford all of this? Fortunately, the most widespread reading problems can be solved inexpensively. The average annual per-pupil cost in the U.S. is about $11,000. The annual cost of our proven Success for All reading program is around $100 additional, or less than 1% of what we are already spending. Two pairs of eyeglasses — one to take home and one to leave at school — including the eye exam and glasses replacement, costs less than $50. Proven tutoring models provided by paraprofessionals can cost as little as $400 per student, but even at $2000 for one-to-one tutoring, that’s 18% of average per-pupil cost, and for only a minority of the class.

These modest expenditures on proven programs quickly pay back their costs in terms of reducing special education and retention, much less long-term benefits to children and society. Yet none of the 60 proven and promising programs I found is in truly widespread use.

On my airplane, of course, the situation is quite different. Pilots are carefully and extensively trained in proven methods. Technology is constantly developing to provide information and automated assistance to ensure safety and effectiveness. Back-up systems ensure that if things go wrong despite the best of preparation, disaster will not result. All of these systems are constantly evolving in response to development, evaluation, and implementation of innovations.

The reading success of a child is a very serious matter. It simply makes no sense to treat it any less seriously than we treat air safety. Just as on airplanes, we need systems to monitor children’s success, not to punish teachers but to know when and how to intervene if trouble arises.

Perhaps someday, we’ll put Boeing or Lockheed Martin in charge of our schools, and charge them with getting us as close as possible to 100% success in reading. I can see it now.

Proven approaches to:

Phonemic awareness? Check
Phonics? Check
Vocabulary? Check
Fluency? Check
Comprehension? Check
Vision? Check
Hearing? Check
Tutoring backup? Check

Ready for takeoff!

Of course we can solve this problem. All we have to do is to decide it must be solved and then do it. It is neither efficient nor ethical to keep accepting the number of reading disasters we experience in our schools.

Is Now the Time to Reauthorize ESEA?

2015-04-09-1428593975-5920827-CurvesAhead_500x3902.jpg

The Elementary and Secondary Education Act (ESEA), currently also known as No Child Left Behind (NCLB), the giant centerpiece of educational policy, is up for reauthorization. Again. What that means is that it’s time to revisit the act in order to make changes and improvements to the law. Of course, it was supposed to be reauthorized in 2007, but what with partisan politics, outside influences and the lack of any general consensus around the various efforts, Congress has yet to successfully reauthorize the legislation. As a result, national educational policy has been a patchwork of waivers, dodges, and weaves unworthy of a great nation. ESEA is the Eeyore of legislation: “I’ll probably never be reauthorized.” Or the Rodney Dangerfield: “I get no respect.” Or the Godot, for which we’ll be forever waiting.

This year, Congress is taking up ESEA reauthorization again, but the road ahead remains long and fraught with obstacles. The House version, introduced by Reps. John Kline (R-Minnesota) and Todd Rokita (R-Indiana), made it through the Education and Workforce Committee along strict party lines, yet in February it was pulled right before a vote by the full House, with many surmising that it just wasn’t conservative enough to garner the votes it would need to pass. This week, Sens. Lamar Alexander (R-Tennessee) and Patty Murray (D-Washington) released a bipartisan compromise bill that they hope will make it through the Senate. But the draft is still open to amendments by the members of the HELP Committee and then the full Senate, and whether a single bill can satisfy the demands and desires of the broad political spectrum entrenched in Washington right now is unclear. Even if ESEA does not get reauthorized this Congress, the process is a necessary step toward eventually creating a better bill. Each Congress, when ESEA is debated, progress is made, and sometimes that progress leads to positive changes even without a comprehensive agreement. But it would be nice to have a well-considered, widely supported law at the center of education policy.

On the other hand, there are several reasons that it may not be so awful to delay reauthorization until after the next presidential election. Beyond the hope that things might be less partisan by then, there are several positive developments underway that are not yet far enough along to be central to ESEA but could be given two more years.

The first, of course, is the evidence movement. Recent investments, such as Investing in Innovation and IES, have produced a broad set of proven and promising programs for schools. Schools are just starting to be encouraged to use proven programs with their federal funds, as in the evidence-proven, whole-school approach to school-improvement grants. Title II (professional development) has begun requiring grantees to have at least moderate evidence of effectiveness and gives a lot of competitive preference points for programs with strong evidence. President Obama’s budget proposal contained a provision called “Leveraging What Works,” providing schools with incentive funds if they use their formula funding to adopt proven programs. These changes are just happening now, too recently to affect ESEA. If they continue for two more years, they may have profound impacts on ESEA.

Another development is Common Core. This set of standards, and the computerized testing sometimes associated with them, are too new to be fully understood. In two years their potential role in ESEA will be better known.

Finally, technology is headed into our schools at an astonishing pace, yet we still are not clear about how to use it or what it will do. I’d be reluctant to build technology policies into ESEA before we really know more about what universal access to digital devices could accomplish.

Given how long No Child Left Behind has overstayed its welcome, it may be especially important to get the next reauthorization right. It could be with us for a very long time!

What Makes Educational Technology Programs Work?

2015-01-29-HP59_01_29_15.jpg

While everyone else is having a lot more fun, my colleagues and I sit up late at night writing a free website, the Best Evidence Encyclopedia (www.bestevidence.org), which reviews evaluations of educational programs in reading, math, and science.

The recent reports reinforce an observation I’ve made previously. When programs are found to have little or no impact on student learning, it is often the case that they provide very little professional development to teachers. Giving teachers lots of professional development does not guarantee positive effects, but failing to do so seems to virtually guarantee disappointing impacts.

This observation takes on new importance as technology comes to play an increasing role in educational innovation. Numerous high-quality studies of traditional computer-assisted instruction programs, in which students walk down the hall or to the back of the classroom to work on technology largely disconnected from teachers’ instruction, find few positive effects on learning. Many technology applications appearing in schools today have learned nothing from this sad history and are offering free or low-cost apps that students work on individually, with little professional development for teachers or even any connection to their (non-technology) lessons. In light of the prior research, it would be astonishing if these apps made any difference in student learning, no matter how appealing or well-designed they are.

Alongside the thousands of free apps going into schools, there has also developed an entirely different approach to technology, one that integrates technology with teacher lessons and provides teachers with extensive professional development and coaching. Studies of such programs do find significant positive effects. As one example, I recently saw an evaluation of a reading and math program called Time to Know. In Time to Know, teachers use computers and their own non-computer lessons to start a lesson. Students then do activities on their individual devices, personalized to their needs and learning histories. Student learning is continuously assessed and fed back to the teacher to use in informing further lessons and guiding interventions with individual students.

Time to Know provides teachers with significant professional development and coaching, so they can use it flexibly and effectively. Perhaps as a result, the program showed very good outcomes in a small but high-quality study, with an effect size of +0.32 in reading and +0.29 in math.

There are many other studies of classroom programs that improve student learning, in particular studies of forms of cooperative learning in many subjects and grade levels. As a group, the outcomes reported in these studies are always far higher than those seen in studies of traditional technology applications, in all subjects and grade levels. What is interesting about the study of Time to Know is that here is an unusually positive outcome for a technology application in a rigorous experiment. What is unique about the intervention is that it embeds technology in the classroom and provides teachers with extensive PD. Perhaps classroom-embedded technology with adequate professional development is the wave of the future, and perhaps it will finally achieve the long-awaited breakthroughs that technology has been promising for the past 40 years.

How Universal Access to Technology Could Advance Evidence-Based Reform. Or Not.

2014-11-20-HP4211_20_2014.jpg

Since the early 1960s (at least), breakthroughs in education caused by advances in technology have been confidently predicted. First it was teaching machines, then mainframes, then laptops, then video disks, then interactive whiteboards, and now blended and flipped learning. Sadly, however, each innovation of the past has ended up making little if any difference in student achievement. I remain hopeful that this time, technology could produce breakthroughs, if the new capabilities of technology are used to create systematically enhanced environments for learning, new approaches to teaching based on new technologies are rigorously evaluated, and approaches found to be successful are broadly disseminated.

My reason for hope, this time around, lies in the fact that schools are rapidly moving toward providing universal access to tablets or other relatively low-cost digital devices. This is a potential game-changer, as universal access makes it possible for teachers to give digital assignments to all students. It also makes it possible for developers to create replicable strategies that make optimal use of personalized instruction, simulations, visual media, games, sophisticated real-time assessments, links to other students within and beyond the classroom, links to prescreened and curated information, and so on. If students also have compatible technology at home, this adds the possibility of integration of homework and classwork (which is essential for blended and flipped learning, for example, but also for simpler means of making homework engaging, game-like, and useful for learning).

All of these possibilities are only potentials, not actualities, and given the long, sad history of technology in schools, they may well not take place. A lot of the applications of universal access to digital devices common today are merely reinventions of computer-assisted instruction (CAI), which has a particularly poor research record. Other uses are for poorly designed project-based learning or for applications that do little more than make traditional teaching a little easier for teachers. There are hundreds of applications available for every possible classroom use, but the quality of these applications varies widely, and they do not readily integrate with each other or with other instruction or standards. Hardworking, tech-savvy teachers can in principle assemble fabulous lessons, but this is difficult to do on a large scale.

Before universal access to technology can transform education, a great deal of creative work needs to be done to make and evaluate courses or major portions of courses using the new technology opportunities. Imagine developers, researchers, nonprofits, and for-profits partnering with experienced teachers to create astonishing, integrated, and complete approaches to, say, beginning reading, elementary math and science, secondary algebra, or high school physics. In each case, programs would be rigorously evaluated, and then disseminated if found to be effective.

There are many applications of universal access technologies that make me optimistic. For example, current or on-the-horizon technologies could enhance teachers’ abilities to teach traditional lessons. Prepared lessons might incorporate visual media, games, or simulations in initial teaching. They might use computer-facilitated cooperative learning with embedded assessments and feedback to replace worksheets. They might embed assessments in games, simulations, and cooperative activities to replace formative and summative assessments. Simulations of lab experiments could make inquiry-oriented instruction in science and math far more common. Access to curated, age-appropriate libraries of information could transform social studies and science. Computerized assessments of writing, including creative writing as well as grammar, punctuation, and spelling could help students working with peers to become more effective writers.

In each of these cases, extensive development, piloting, and evaluation will be necessary, but once created and found to be effective, digitally enhanced models will be extremely popular, and their costs will decline with scale.

Even as technology’s past should make us wary of unsupported claims and premature enthusiasm, the future can be different. In all areas of technology other than education, someone creates a new product, finds it to be effective, and then makes it available for widespread adoption. A time of tinkering yields to a time of solid accomplishment. This can happen in education, too. With adequate support for R&D, breakthroughs are likely, and when they happen in any area, they increase the possibilities of breakthroughs in other areas.

Sooner or later, technology will help students learn far more than they do today. The technology models ready to go today do not yet have the evidence base to justify a lot of optimism, but in the age of universal access, we’ve only just begun.