What Kinds of Studies Are Likely to Replicate?

Replicated scientists 03 01 18

In the hard sciences, there is a publication called the Journal of Irreproducible Results.  It really has nothing to do with replication of experiments, but is a humor journal by and for scientists.  The reason I bring it up is that to chemists and biologists and astronomers and physicists, for example, an inability to replicate an experiment is a sure indication that the original experiment was wrong.  To the scientific mind, a Journal of Irreproducible Results is inherently funny, because it is a journal of nonsense.

Replication, the ability to repeat an experiment and get a similar result, is the hallmark of a mature science.  Sad to say, replication is rare in educational research, which says a lot about our immaturity as a science.  For example, in the What Works Clearinghouse, about half of programs across all topics are represented by a single evaluation.  When there are two or more, the results are often very different.  Relatively recent funding initiatives, especially studies supported by Investing in Innovation (i3) and the Institute for Education Sciences (IES), and targeted initiatives such as Striving Readers (secondary reading) and the Preschool Curriculum Evaluation Research (PCER), have added a great deal in this regard. They have funded many large-scale, randomized, very high-quality studies of all sorts of programs in the first place, and many of these are replications themselves, or they provide a good basis for replications later.  As my colleagues and I have done many reviews of research in every area of education, pre-kindergarten to grade 12 (see www.bestevidence.org), we have gained a good intuition about what kinds of studies are likely to replicate and what kinds are less likely.

First, let me define in more detail what I mean by “replication.”  There is no value in replicating biased studies, which may well consistently find the same biased results (as when, for example, both the original studies and the replication studies used the same researcher- or developer-made outcome measures that are slanted toward the content the experimental group experienced but not what the control group experienced) (See http://www.tandfonline.com/doi/abs/10.1080/19345747.2011.558986.)

Instead, I’d consider a successful replication one that shows positive outcomes both in the original studies and in at least one large-scale, rigorous replication. One obvious way to increase the chances that a program producing a positive outcome in one or more initial studies will succeed in such a rigorous replication evaluation is to use a similar, equally rigorous evaluation design in the first place. I think a lot of treatments that fail to replicate are ones that used weak methods in the original studies. In particular, small studies tend to produce greatly inflated effect sizes (see http://www.bestevidence.org/methods/methods.html), which are unlikely to replicate in larger evaluations.

Another factor likely to contribute to replicability is use in the earlier studies of methods or conditions that can be repeated in later studies, or in schools in general. For example, providing teachers with specific manuals, videos demonstrating the methods, and specific student materials all add to the chances that a successful program can be successfully replicated. Avoiding unusual pilot sites (such as schools known to have outstanding principals or staff) may contribute to replication, as these conditions are unlikely to be found in larger-scale studies. Having experimenters or their colleagues or graduate students extensively involved in the early studies diminishes replicability, of course, because those conditions will not exist in replications.

Replications are entirely possible. I wish there were a lot more of them in our field. Showing that programs can be effective in just two rigorous evaluations is way more convincing than just one. As evidence becomes more and more important, I hope and expect that replications, perhaps carried out by states or districts, will become more common.

The Journal of Irreproducible Results is fun, but it isn’t science. I’d love to see a Journal of Replications in Education to tell us what really works for kids.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.


Evidence for ESSA Celebrates its First Anniversary

Penguin 02 22 18On February 28, 2017 we launched Evidence for ESSA (www.evidenceforessa.org), our website providing the evidence to support educational programs according to the standards laid out in the Every Child Succeeds Act in December, 2015.

Evidence for ESSA began earlier, of course. It really began one day in September, 2016, when I heard leaders of the Institute for Education Sciences (IES) and the What Works Clearinghouse (WWC) announce that the WWC would not be changed to align with the ESSA evidence standards. I realized that no one else was going to create scientifically valid, rapid, and easy-to-use websites providing educators with actionable information on programs meeting ESSA standards. We could do it because our group at Johns Hopkins University, and partners all over the world, had been working for many years creating and updating another website, the Best Evidence Encyclopedia (BEE; www.bestevidence.org).BEE reviews were not primarily designed for practitioners and they did not align with ESSA standards, but at least we were not starting from scratch.

We assembled a group of large membership organizations to advise us and to help us reach thoughtful superintendents, principals, Title I directors, and others who would be users of the final product. They gave us invaluable advice along the way. We also assembled a technical working group (TWG) of distinguished researchers to advise us on key decisions in establishing our website.

It is interesting to note that we have not been able to obtain adequate funding to support Evidence for ESSA. Instead, it is mostly being written by volunteers and graduate students, all of whom are motivated only by a passion for evidence to improve the education of students.

A year after launch, Evidence for ESSA has been used by more than 36,000 unique users, and I hear that it is very useful in helping states and districts meet the ESSA evidence standards.

We get a lot of positive feedback, as well as complaints and concerns, to which we try to respond rapidly. Feedback has been important in changing some of our policies and correcting some errors and we are glad to get it.

At this moment we are thoroughly up-to-date on reading and math programs for grades pre-kindergarten to 12, and we are working on science, writing, social-emotional outcomes, and summer school. We are also continuing to update our more academic BEE reviews, which draw from our work on Evidence for ESSA.

In my view, the evidence revolution in education is truly a revolution. If the ESSA evidence standards ultimately prevail, education will at long last join fields such as medicine and agriculture in a dynamic of practice to development to evaluation to dissemination to better practice, in an ascending spiral that leads to constantly improving practices and outcomes.

In a previous revolution, Thomas Jefferson said, “If I had to choose between government without newspapers and newspapers without government, I’d take the newspapers.” In our evidence revolution in education, Evidence for ESSA, the WWC, and other evidence sources are our “newspapers,” providing the information that people of good will can use to make wise and informed decisions.

Evidence for ESSA is the work of many dedicated and joyful hands trying to provide our profession with the information it needs to improve student outcomes. The joy in it is the joy in seeing teachers, principals, and superintendents see new, attainable ways to serve their children.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

On High School Graduation Rates: Want to Buy My Bridge?

FSK Bridge 02 13 18

Francis Scott Key Bridge (Baltimore) By Artondra Hall [CC BY 2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons, edited for size


I happen to own the Francis Scott Key Bridge in Baltimore, pictured here. It’s lovely in itself, has beautiful views of downtown and the outer harbor, and rakes in more than $11 million in tolls each year. But I’m willing to sell it to you, cheap!

If you believe that I own a bridge in Baltimore, then let me try out an even more fantastic idea on you. Since 1992, the achievement of America’s 12th graders on NAEP reading and math tests has been unchanged. Yet high school graduation rates have been soaring. From 2006 to 2016, U.S. graduation rates have increased from 73% to 84%, an all-time record. Does this sound plausible to you?

Recently, the Washington Post (https://www.washingtonpost.com/local/education/fbi-us-education-department-investigating-ballou-graduation-scandal/2018/02/02/b307e57c-07ab-11e8-b48c-b07fea957bd5_story.html?utm_term=.84c1176bb8ff) reported a scandal about graduation rates at Ballou High School in Washington, DC, a high-poverty school not known (in the past) for its graduation rates. In 2017, 100% of Ballou students graduated, and 100% were accepted into college. An investigation by radio station WAMU, however, found that a large proportion of the graduating seniors had very poor attendance, poor achievement, and other problems. In fact, the Post reported that one third of all graduating seniors in DC did not meet district graduation standards. Ballou’s principal and the DC Director of Secondary Schools resigned, and there are ongoing investigations. The FBI has recently gotten involved.

In response to these stories, teachers across America wrote to express their views. Almost without exception, the teachers said that the situation in their districts is similar to that in DC. They said they are pressured, even threatened, to promote and then graduate every student possible. Students who fail courses are often offered “credit recovery” programs to obtain their needed credits, and these were found in an investigation by the Los Angeles Times  to have extremely low standards (https://robertslavinsblog.wordpress.com/2017/08/17/the-high-school-graduation-miracle/). Failing students may also be allowed to do projects or otherwise show their knowledge in alternative ways, but these are derided as “Mickey Mouse.” And then there are students like some of those at Ballou, who did not even bother to show up for credit recovery or Mickey Mouse, but were graduated anyway.

The point is, it’s not just Ballou. It’s not just DC. In high-poverty districts coast to coast, standards for graduation have declined. My colleague, Bob Balfanz, coined the term “dropout factories” many years ago to describe high schools, almost always serving high-poverty areas, that produced a high proportion of all dropouts nationwide. In response, our education system got right to work on what it does best: Change the numbers to make the problem appear to go away. The FBI might make an example of DC, but if DC is in fact doing what many high-poverty districts are doing throughout the country, is it fair to punish it disproportionately? It’s not up to me to judge the legalities or ethics involved, but clearly, the problem is much, much bigger.

Some people have argued with me on this issue. “Where’s the harm,” they ask, “in letting students graduate? So many of these students encounter serious barriers to educational success. Why not give them a break?”

I will admit to a sympathy for giving high school students who just barely miss standards legitimate opportunities to graduate, such as taking appropriately demanding makeup courses. But what is happening in DC and elsewhere is very far from this reasonable compromise with reality.

I have done some research in inner-city high schools. In just about every class, there are students who are actively engaged in lessons, and others who would become actively engaged if their teachers used proven programs (in my case it was cooperative learning). But even with the best programs, there were kids in the back of the class with headphones on, who were totally disengaged, no matter what the teacher did. And those were the ones who actually showed up at all.

The kids who were engaged, or became engaged because of excellent instruction, should have a path to graduation, one way or another. The rest should have every opportunity, encouragement, and assistance to reach this goal. Some will choose to take advantage, some will not, but that must be their choice, with appropriate consequences.

Making graduation too easy not only undermines the motivations of students (and teachers). It also undermines the motivation of the entire system to introduce and effectively implement effective programs, from preschool to 12th grade. If educators can keep doing what they’ve always done, knowing that numbers will be fiddled with at the end to make everything come out all right, then the whole system can and will lose a major institutional incentive for improvement.

The high dropout rate of inner-city schools is indeed a crisis. It needs to be treated as such-not a crisis of numbers, but a crisis encountered by hundreds of thousands of vulnerable, valuable students. Loosening standards and then declaring success, which every educator knows to be false, corrupts the system, undermining confidence in the numbers even when they are legitimate. It fosters cynicism that nothing can be done.

Is it too much to expect that we can create and implement effective strategies that would enable virtually all students to succeed on appropriate standards in elementary, middle, and high school, so that virtually all can meet rigorous requirements and walk across a stage, head held high, knowing that they truly attained what a high school diploma is supposed to certify?

If you agree that high school graduation standards have gone off the rails, it is not enough to demand tougher standards. You also have to advocate for and work for application of proven approaches to make deserved and meaningful graduation accessible to all.

On the other hand, if you think the graduation rate has legitimately skyrocketed in the absence of any corresponding improvement in reading or math achievement, please contact me at www.buy-my-bridge.com. It really is a lovely bridge.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.


Getting the Best Mileage from Proven Programs

Race carWouldn’t you love to have a car that gets 200 miles to the gallon? Or one that can go hundreds of miles on a battery charge? Or one that can accelerate from zero to sixty twice as fast as any on the road?

Such cars exist, but you can’t have them. They are experimental vehicles or race cars that can only be used on a track or in a lab. They may be made of exotic materials, or may not carry passengers or groceries, or may be dangerous on real roads.

In working on our Evidence for ESSA website (www.evidenceforessa.org), we see a lot of studies that are like these experimental cars. For example, there are studies of programs in which the researcher or her graduate students actually did the teaching, or in which students used innovative technology with one adult helper for every machine or every few machines. Such studies are fine for theory building or as pilots, but we do not accept them for Evidence for ESSA, because they could never be replicated in real schools.

However, there is a much more common situation to which we pay very close attention. These are studies in which, for example, teachers receive a great deal of training and coaching, but an amount that seems replicable, in principle. For example, we would reject a study in which the experimenters taught the program, but not one in which they taught ordinary teachers how to use the program.

In such studies, the problem comes in dissemination. If studies validating a program provided a lot of professional development, we would accept it only if the disseminator provides a similar level of professional development, and their estimates of cost and personnel take this level of professional development into account. We put on our website clear expectations that these services be provided at a level similar to what was provided in the research, if the positive outcomes seen in the research are to be obtained.

The problem is that disseminators often offer schools a form of the program that was never evaluated, to keep costs low. They know that schools don’t like to spend a lot on professional development, and they are concerned that if they require the needed levels of PD or other services or materials, schools won’t buy their program. At the extreme end of this, there are programs that were successfully evaluated using extensive professional development, and then put their teacher’s manual on the web for schools to use for free.

A recent study of a program called Mathalicious illustrated the situation. Mathalicious is an on-line math course for middle school. An evaluation found that teachers randomly assigned to just get a license, with minimal training, did not obtain significant positive impacts, compared to a control group. Those who received extensive on-line training, however, did see a significant improvement in math scores, compared to controls.

When we write our program descriptions, we compare program implementation details in the research to what is said or required on the program’s website. If these do not match, within reason, we try to make it clear what were the key elements necessary for success.

Going back to the car analogy, our procedures eliminate those amazing cars that can only operate on special tracks, but we accept cars that can run on streets, carry children and groceries, and generally do what cars are expected to do. But if outstanding cars require frequent recharging, or premium gasoline, or have other important requirements, we’ll say so, in consultation with the disseminator.

In our view, evidence in education is not for academics, it’s for kids. If there is no evidence that a program as disseminated benefits kids, we don’t want to mislead educators who are trying to use evidence to benefit their children.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.


Evidence-Based Does Not Equal Evidence-Proven


As I speak to educational leaders about using evidence to help them improve outcomes for students, there are two words I hear all the time that give me the fantods (as Mark Twain would say):


            I like the first word, “evidence,” just fine, but the second word, “based,” sort of negates the first one. The ESSA evidence standards require programs that are evidence-proven, not just evidence-based, for various purposes.

“Evidence-proven” means that a given program, practice, or policy has been put to the test. Ideally, students, teachers, or schools have been assigned at random to use the experimental program or to remain in a control group. The program is provided to the experimental group for a significant period of time, at least a semester, and then final performance on tests that are fair to both groups are compared, using appropriate statistics.

If your doctor gives you medicine, it is evidence proven. It isn’t just the same color or flavor as something proven, it isn’t just generally in line with what research suggests might be a good idea. Instead, it has been found to be effective, compared to current standards of care, in rigorous studies.

“Evidence-based,” on the other hand, is one of those wiggle words that educators love to use to indicate that they are up-to-date and know what’s expected, but don’t actually intend to do anything different from what they are doing now.

Evidence-based is today’s equivalent of “based on scientifically-based research” in No Child Left Behind. It sure sounded good, but what educational program or practice can’t be said to be “based on” some scientific principle?

In a recent Brookings article Mark Dynarski wrote about state ESSA plans, and conversations he’s heard among educators. He says that the plans are loaded with the words “evidence-based,” but with little indication of what specific proven programs they plan to implement, or how they plan to identify, disseminate, implement, and evaluate them.

I hope the ESSA evidence standards give leaders in even a few states the knowledge and the courage to insist on evidence-proven programs, especially in very low-achieving “school improvement” schools that desperately need the very best approaches. I remain optimistic that ESSA can be used to expand evidence-proven practices. But will it in fact have this impact? That remains to be proven.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.