“We Don’t Do Lists”

blog218_Santa_500x332 (2)

Watching the slow, uneven, uncertain rollout of the ESSA evidence standards gives me a mixture of hope and despair. The hope stems from the fact that from coast to coast, educational leaders are actually talking about proven programs and practices at all. That was certainly rare before ESSA. But despair in that I hear many educational leaders trying to find the absolute least their states and districts can do to just barely comply with the law. The ESSA evidence standards apply in particular to schools seeking school improvement funding, which are those in the lowest 5% of their states in academic performance. A previous program with a similar name but more capital letters, School Improvement, was used under NCLB, before ESSA. A large-scale evaluation by MDRC found that the earlier School Improvement made no difference in student achievement, despite billions of dollars in investments. So you’d imagine that this time around, educators responsible for school improvement would be eager to use the new law to introduce proven programs into their lowest-achieving schools. In fact, there are individual leaders, districts, and states who have exactly this intention, and may ultimately provide good examples to the rest. But they face substantial obstacles.

One of the obstacles I hear about often is an opposition among state departments of education to disseminating lists of proven programs. I very much understand and sympathize with their reluctance, as schools have been over-regulated for a long time. However, I do not see how the ESSA evidence standards can make much of a difference if everyone makes their own list of programs. Determining which studies meet ESSA evidence standards is difficult, and requires a great deal of knowledge about research (I know this, of course, because we do such reviews ourselves; see www.evidenceforessa.org).

Some say that they want programs that have been evaluated in their own states. But after taking into account demographics (e.g., urban/rural, ELL/not ELL, etc), are state-to-state differences so great as to require different research in each? We used to work with a school located on the Ohio-Indiana border, which ran right through the building. Were there really programs that were effective on one side of the building but not on the other?

Further, state department leaders frequently complain that they have too few staff to adequately manage school improvement across their states. Should that capacity be concentrated on reviewing research to determine which programs meet ESSA evidence standards and which do not?

The irony of opposing lists for ESSA evidence standards is that most states are chock full of lists that restrict the textbooks, software, and professional development schools can select using state funds. These lists may focus on paperweight, binding, and other minimum quality issues, but they almost never have anything to do with evidence of effectiveness. One state asked us to review their textbook adoption lists for reading and math, grades K-12. Collectively, there were hundreds of books, but just a handful had even a shred of evidence of effectiveness.

Educational leaders are constantly buffeted by opposing interest groups, from politicians to school board members to leaders of unions, from PTAs presidents to university presidents, to for-profit companies promoting their own materials and programs. Educational leaders need a consistent way to ensure that the decisions they make are in the best interests of children, not the often self-serving interests of adults. The ESSA evidence standards, if used wisely, give education leaders an opportunity to say to the whole cacophony of cries for special consideration, “I’d love to help you all, but we can only approve programs for our lowest-achieving schools that are known from rigorous research to benefit our children. We say this because it is the law, but also because we believe our children, and especially our lowest achievers, deserve the most effective programs, no matter what the law says.”

To back up such a radical statement, educational leaders need clarity about what their standards are and which specific programs meet those standards. Otherwise, they either have an “anything goes’ strategy that in effect means that evidence does not matter, or they have competing vendors claiming an evidence base for their favored program. Lists of proven programs can disappoint those whose programs aren’t on the list, but they are at least clear and unambiguous, and communicate to those who want to add to the list exactly what kind of evidence they will need.

States or large districts can create lists of proven programs by starting with existing national lists (such as the What Works Clearinghouse or Evidence for ESSA) and then modifying them, perhaps by adding additional programs that meet the same standards and/or eliminating programs not available in a given location. Over time, existing or new programs can be added as new evidence appears. We, at Evidence for ESSA, are willing to review programs being considered by state or local educators for addition to their own lists, and we will do it for free and in about two weeks. Then we’ll add them to our national list if they qualify.

It is important to say that while lists are necessary, they are not sufficient. Thoughtful needs assessments, information on proven programs (such as effective methods fairs and visits to local users of proven programs), and planning for high-quality implementation of proven programs are also necessary. However, students in struggling schools cannot wait for every school, district, and state to reinvent the wheel. They need the best we can give them right now, while the field is working on even better solutions for the future.

Whether a state or district uses a national list, or starts with such a list and modifies it for its own purposes, a list of proven programs provides an excellent starting point for struggling schools. It plants a flag for all to see, one that says “Because this (state/district/school) is committed to the success of every child, we select and carefully implement programs known to work. Please join us in this enterprise.”

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.


Half a Worm: Why Education Policy Needs High Evidence Standards

There is a very old joke that goes like this:

What’s the second-worst thing to find in your apple?  A worm.

What’s the worst?  Half a worm.

The ESSA evidence standards provide clearer definitions of “strong,” “moderate,” and “promising” levels of evidence than have ever existed in law or regulation. Yet they still leave room for interpretation.  The problem is that if you define evidence-based too narrowly, too few programs will qualify.  But if you define evidence-based too broadly, it loses its meaning.

We’ve already experienced what happens with a too-permissive definition of evidence.  In No Child Left Behind, “scientifically-based research” was famously mentioned 110 times.  The impact of this, however, was minimal, as everyone soon realized that the term “scientifically-based” could be applied to just about anything.

Today, we are in a much better position than we were in 2002 to insist on relatively strict evidence of effectiveness, both because we have better agreement about what constitutes evidence of effectiveness and because we have a far greater number of programs that would meet a high standard.  The ESSA definitions are a good consensus example.  Essentially, they define programs with “strong evidence of effectiveness” as those with at least one randomized study showing positive impacts using rigorous methods, and “moderate evidence of effectiveness” as those with at least one quasi-experimental study.  “Promising” is less well-defined, but requires at least one correlational study with a positive outcome.

Where the half-a-worm concept comes in, however, is that we should not use a broader definition of “evidence-based”.  For example, ESSA has a definition of “strong theory.”  To me, that is going too far, and begins to water down the concept.  What program in all of education cannot justify a “strong theory of action”?

Further, even in the top categories, there are important questions about what qualifies. In school-level studies, should we insist on school-level analyses (i.e., HLM)? Every methodologist would say yes, as I do, but this is not specified. Should we accept researcher-made measures? I say no, based on a great deal of evidence indicating that such measures inflate effects.

Fortunately, due to investments made by IES, i3, and other funders, the number of programs that meet strict standards has grown rapidly. Our Evidence for ESSA website (www.evidenceforessa.org) has so far identified 101 PK-12 reading and math programs, using strict standards consistent with ESSA definitions. Among these, more than 60% meet the “strong” standard. There are enough proven programs in every subject and grade level to give educators choices among proven programs. And we add more each week.

This large number of programs meeting strict evidence standards means that insisting on rigorous evaluations, within reason, does not mean that we end up with too few programs to choose among. We can have our apple pie and eat it, too.

I’d love to see federal programs of all kinds encouraging use of programs with rigorous evidence of effectiveness.  But I’d rather see a few programs that meet a strict definition of “proven” than to see a lot of programs that only meet a loose definition.  20 good apples are much better than applesauce of dubious origins!

This blog is sponsored by the Laura and John Arnold Foundation

Getting Past the Dudalakas (And the Yeahbuts)

Phyllis Hunter, a gifted educator, writer, and speaker on the teaching of reading, often speaks about the biggest impediments to education improvement, which she calls the dudalakas. These are excuses for why change is impossible.  Examples are:

Dudalaka         Better students

Dudalaka         Money

Dudalaka         Policy support

Dudalaka         Parent support

Dudalaka         Union support

Dudalaka         Time

Dudalaka is just shorthand for “Due to the lack of.” It’s a close cousin of “yeahbut,” another reflexive response to ideas for improving education practices or policy.

Of course, there are real constraints that teachers and education leaders face that genuinely restrict what they can do. The problem with dudalakas and yeahbuts is not that the objections are wrong, but that they are so often thrown up as a reason not to even think about solutions.

I often participate in dudalaka conversations. Here is a composite. I’m speaking with a principal of an elementary school, who is expressing concern about the large number of students in his school who were struggling in reading. Many of these students were headed for special education. “Could you provide them with tutors?” I ask. “Yes, they get tutors, but we use a small group method that emphasizes oral reading (not the phonics skills that the students are actually lacking) (i.e., yeahbut).”

“Could you change the tutoring to focus on the skills you know students need?”

“Yeahbut our education leadership requires we use this system” (dudalaka political support). Besides, we have so many failing students (dudalaka better students) so we have to work with small groups of students (dudalaka tutors).”

“Could you hire and train paraprofessionals or recruit qualified volunteers to provide personalized tutoring?”

“Yeahbut we’d love to, but we can’t afford them (dudalaka money). Besides, we don’t have time for tutoring (dudalaka time).”

“But you have plenty of time in your afternoon schedule.”

“Yeahbut in the afternoon, children are tired. (Dudalaka better students).”

This conversation is not of course a rational discussion of strategies for solving a serious problem. It is instead an attempt by the principal to find excuses to justify his school’s continuing to do what it is doing now. Dudalakas and yeahbuts are merely ways of passing blame to other people (school leaders, teachers, children, parents, unions, and so on) and to shortages of money, time, and other resources that hold back change. Again, these excuses may or may not be valid in a particular situation, but there is a difference between rejecting potential solutions out of hand (using dudalakas and yeahbuts) as opposed to identifying and then carefully and creatively considering potential solutions. Not every solution will be possible or workable, but if the problem is important, some solution must be found. No matter what.

An average American elementary school with 500 students has an annual budget of approximately $6,000,000 ($12,000 per student). Principals and teachers, superintendents, and state superintendents think their hands are tied by limited resources (dudalaka money). But creativity and commitment to core goals can overcome funding limitations if school and district leaders are willing to use resources differently or activate underutilized resources, or ideally, find a way to obtain more funding.

The people who start off with the very human self-protective dudalakas and yeahbuts may, with time, experience, and encouragement, become huge advocates for change. It’s only natural to start with dudalakas and yeahbuts. What is important is that we don’t end with them.

We know that our children are capable of succeeding at much higher rates than they do today. Yet too many are failing, dudalaka quality implementation of proven programs. Let’s clear away the other dudalakas and yeahbuts, and get down to this one.

This blog is sponsored by the Laura and John Arnold Foundation

The WWC’s 25% Loophole

I am a big fan of the concept of the What Works Clearinghouse (WWC), though I have concerns about various WWC policies and practices. For example, I have written previously with concerns about WWC’s acceptance of measures made by researchers and developers and WWC’s policy against weighting effect sizes by sample sizes when computing mean effect sizes for various programs. However, there is another WWC policy that is a problem in itself, but this problem is made more serious in light of recent Department of Education guidance on the ESSA evidence standards.

The WWC Standards and Procedures 3.0 manual sets rather tough standards for programs to be rated as having positive effects in studies meeting standards “without reservations” (essentially, randomized experiments) and “with reservations” (essentially, quasi-experiments, or matched studies). However, the WWC defines a special category of programs for which all caution is thrown to the winds. Such studies are called “substantively important,” and are treated as though they met WWC standards. Quoting from Standards and Procedures 3.0: “For the WWC, effect sizes of +0.25 standard deviations or larger are considered to be substantively important…even if they might not reach statistical significance…” The “effect size greater than +0.25” loophole (the >0.25 loophole, for short) is problematic in itself, but could lead to catastrophe for the ESSA evidence standards that now identify programs that meet “strong,” “moderate,” and “promising” levels of evidence.

The problem with the >0.25 loophole is that studies that meet the loophole criterion without meeting the usual methodological criteria are usually very, very, very bad studies, usually with a strong positive bias. These studies are often very small (far too small for statistical significance). They usually use measures made by the developers or researchers, or ones that are excessively aligned with the content of the experimental group but not the control group.

One example of the >0.25 loophole is a Brady (1990) study accepted as “substantively important” by the WWC. In it, 12 students in rural Alaska were randomly assigned to Reciprocal Teaching or to a control group. The literacy treatment was built around specific science content, but the control group never saw this content. Yet one of the outcome measures, focused on this content, was made by Mr. Brady, and two others were scored by him. Mr. Brady also happened to be the teacher of the experimental group. The effect size in this awful study was an extraordinary +0.65, though outcomes in other studies assessed on measures more fair to the control group were much smaller.

Because the WWC does not weight studies by sample size, this tiny, terrible study had the same impact in the WWC summary as studies with hundreds or thousands of students.

For the ESSA evidence standards, the >0.25 loophole can lead to serious errors. A single study meeting standards makes a program qualify for one of the top-three ESSA standards (strong, moderate, or promising). There can be financial consequences for schools using programs in the top three categories (for example, use of such programs is required for schools seeking school improvement grants). Yet a single study meeting the standards, including the awful 12-student study of Reciprocal Teaching, qualify the program for the ESSA category, no matter what is found in all other studies (unless there are qualifying studies with negative impacts). Also, the loophole works in the negative direction too, so a small, terrible study could find an effect size less than -0.25, and no amount or quality of positive findings could make that program meet WWC standards.

The >0.25 loophole is bad enough for research that already exists, but for the future, the problem is even more serious. Program developers or commercial publishers could do many small studies of their programs or could commission studies using developer-made measures. Once a single study exceeds an effect size of +0.25, the program may be considered validated forever.

To add to the problem, in recent guidance from the U. S. Department of Education, a definition of the ESSA “promising” definition specifically mentions the idea that programs can meet the promising definition if they can report statistically significant or substantively important outcomes. The guidance refers to the WWC standards for the “strong” and “moderate” categories, and the WWC standards themselves allow for the >0.25 loophole (even though this is not mentioned or implied by the law itself, which consistently requires statistically significant outcomes, not “substantially important”). In other words, programs that meet WWC standards for “positive” or “potentially positive” based on substantively important evidence alone explicitly do not meet ESSA standards, which require statistical significance. Yet the recent regulations do not recognize this problem.

The >0.25 loophole began, I’d assume, when the WWC was young and few programs met its standards. It was jokingly called the “Nothing Works Clearinghouse.” The loophole was probably added to increase the numbers of included programs. This loophole produced misleading conclusions, but since the WWC did not matter very much to educators, there were few complaints. Today, however, the WWC has greater importance because of the ESSA evidence standards.

Bad loopholes make bad laws. It is time to close this loophole, and eliminate the category of “substantively important.”

Research and Development Saved Britain. Maybe They Will Save U.S. Education

One of my summer goals is to read the entire 6 volume history of the Second World War by Winston Churchill. So far, I’m about halfway through the first volume, The Gathering Storm, about the period leading up to 1939.

The book is more or less a wonderfully written rant about the Allies’ shortsightedness. As Hitler built up his armaments, Britain, France, and their allies maintained a pacifist insistence on reducing theirs. Only in the mid-thirties, when war was inevitable, did Britain start investing in armaments, but even then at a very modest pace.

Churchill was a Member of Parliament but was out of government. However, he threw himself into the one thing he could do to help Britain prepare: research and development. In particular, he worked with top scientists to develop the capacity to track, identify, and shoot down enemy aircraft.

When the 1940 Battle of Britain came and German planes tried to destroy and demoralize Britain in advance of an invasion, the inventions by Churchill’s group were a key factor in defeating them.

Churchill’s story is a good analogue to the situation of education research and development. In the current environment, the best-evaluated, most effective programs are not in wide use in U.S. schools. But the research and development that creates and evaluates these programs is essential. It is useful right away in hundreds of schools that do use proven programs already. But imagine what would happen if federal, state, or local governments anywhere decided to use proven programs to combat their most important education problems at scale. Such a decision would be laudable in principle, but where would the proven programs come from? How would they generate convincing evidence of effectiveness?  How would they build robust and capable organizations to provide high-quality professional development materials, and software?

The answer is research and development, of course. Just as Churchill and his scientific colleagues had to create new technologies before Britain was willing to invest in air defenses and air superiority at scale, so American education needs to prepare for the day when government at all levels is ready to invest seriously in proven educational programs.

I once visited a secondary school near London. It’s an ordinary school now, but in 1940 it was a private girls’ school. A German plane, shot down in the Battle of Britain, crash landed near the school. The girls ran out and captured the pilot!

The girls were courageous, as was the British pilot who shot down the German plane. But the advanced systems the British had worked out and tested before the war were also important to saving Britain. In education reform we are building and testing effective programs and organizations to support them. When government decides to improve student learning nationwide, we will be ready, if investments in research and development continue.

This blog is sponsored by the Laura and John Arnold Foundation

Research and Practice: “Tear Down This Wall”

I was recently in Berlin. Today, it’s a lively, entirely normal European capital. But the first time I saw it, it was 1970, and the wall still divided it. Like most tourists, I went through Checkpoint Charlie to the east side. The two sides were utterly different. West Berlin was pleasant, safe, and attractive. East Berlin was a different world. On my recent trip, I met a young researcher who grew up in West Berlin. He recalls his father being taken in for questioning because he accidentally brought a West Berlin newspaper across the border. Western people could visit, but western newspapers could get you arrested.

I remember John F. Kennedy’s “Ich bin ein Berliner” speech, and Ronald Reagan’s “Mr. Gorbechev, tear down this wall.” And one day, for reasons no one seems to understand, the wall was gone. Even today, I find it thrilling and incredible to walk down Unter den Linden under the Brandenburg Gate. Not so long ago, this was impossible, even fatal.

The reason I bring up the Berlin Wall is that I want to use it as an analogy to another wall of less geopolitical consequence, perhaps, but very important to our profession. This is the wall between research and practice.

It is not my intention to disrespect the worlds on either side of the research/practice wall. People on both sides care deeply about children and bring enormous knowledge, skill, and effort to improving educational outcomes. In fact, that’s what is so sad about this wall. People on both sides have so much to teach and learn from the other, but all too often, they don’t.

What has been happening in recent years is that the federal government, at least, has been reinforcing the research/practice divide in many ways, at least until the passage of the Every Student Succeeds Act (ESSA) (more on this later). On one hand, government has invested in high-quality educational research and development, especially through Investing in Innovation (i3) and the Institute of Education Sciences (IES). As a result, over on the research side of the wall there is a growing stockpile of rigorously evaluated, ready-to-implement education programs for most subjects and grade levels.

On the practice side of the wall, however, government has implemented national policies that may or may not have a basis in research, but definitely do not focus on use of proven programs. Examples include accountability, teacher evaluation, and Common Core. Even federal School Improvement Grants (SIG) for the lowest-achieving 5% of schools in each state had loads of detailed requirements for schools to follow but said nothing at all about using proven programs or practices, until a proven whole-school reform option was permitted as one of six alternatives at the very end of No Child Left Behind. The huge Race to the Top funding program was similarly explicit about standards, assessments, teacher evaluations, and other issues, but said nothing about use of proven programs.

On the research side of the wall, developers and researchers were being encouraged by the U.S. Department of Education to write their findings clearly and “scale up” their findings to presumably eager potential adopters on the practice side. Yet the very same department was, at the same time, keeping education leaders on the practice side of the wall scrambling to meet federal standards to obtain Race to the Top, School Improvement Grants, and other funding, none of which had anything much to do with the evidence base building up on the research side of the wall. The problem posed by the Berlin Wall was not going to be resolved by sneaking well-written West Berlin newspapers into East Berlin, or East Berlin newspapers into West Berlin. Rather, someone had to tear down the wall.

The Every Student Succeeds Act (ESSA) is one attempt to tear down the research/practice wall. Its definitions of strong, moderate, and promising levels of evidence, and provision of funding incentives for using proven programs (especially in applications for school improvement), could go a long way toward tearing down the research/practice wall, but it’s too soon to tell. So far, these definitions are just words on a page. It will take national, state, and local leadership to truly make evidence central to education policy and practice.

On National Public Radio, I recently heard recorded recollections from people who were in Berlin the day the wall came down. One of them really stuck with me. West Berliners had climbed to the top of the wall and were singing and cheering as gaps were opened. Then, an East German man headed for a gap. The nearby soldiers, unsure what to do, pointed their rifles at him and told him to stop. He put his hands in the air. The West Germans on the wall fell silent, anxiously watching.

A soldier went to find the captain. The captain came out of a guardhouse and walked over to the East German man. He put his arm around his shoulders and personally walked him through the gap in the wall.

That’s leadership. That’s courage. It’s what we need to tear down our wall: leaders at all levels who actively encourage the world of research and the world of practice to become one. To do it by personal and public examples, so that educators can understand that the rules have changed, and that communication between research and practice, and use of proven programs and practices, will be encouraged and facilitated.

Our wall can come down. It’s only a question of leadership, and commitment to better outcomes for children.

This blog is sponsored by the Laura and John Arnold Foundation

The Age of Evidence

In 1909, most people outside of cities had never seen an automobile. Those that existed frequently broke down, and there were few mechanics. Roads were poor, fuel was difficult to obtain, and spare parts were scarce. The automobile industry had not agreed on the best form of propulsion, so steam-powered cars, electric cars, and diesel cars shared the road with gasoline-powered cars. The high cost of cars made them a rich man’s hobby and a curiosity rather than a practical necessity for most people.

Yet despite all of these limitations, anyone with eyes to see knew that the automobile was the future.

I believe that evidence in education is at a similar point in its development. There are still not enough proven programs in all fields and grade levels. Educators are just now beginning to understand what proven programs can do for their children. Old fashioned textbooks and software lacking a scintilla of evidence still dominate the market. Many schools that do adopt proven programs may still not get promised outcomes because they shortchange professional development, planning, or other resources.

Despite all of these problems, any educator or policy maker with eyes to see knows that evidence is the future.

There are many indicators that the Age of Evidence is upon us. Here are some I’d point to.

· The ESSA evidence standards. The definitions in the ESSA law of strong, moderate, and promising levels of evidence and incentives to use programs that meet them are not yet affecting practice on a large scale, but they are certainly leading to substantial discussion about evidence among state, district, and school leaders. In the long run, this discussion may be as important as the law itself in promoting the use of evidence.

· The availability of many more proven programs. Our Evidence for ESSA website found approximately 100 K-12 reading and math programs meeting one of the top three ESSA standards. Many more are in the pipeline.

· Political support for evidence is growing and non-partisan. Note that the ESSA standards were passed with bipartisan support in a Republican Congress. This is a good indication that evidence is becoming a consensus “good government” theme, not just something that professors do.

· We’ve tried everything else. Despite their commendable support for research, both the G.W. Bush and the Obama administrations mainly focused on policies that ignored the existence of proven programs. Progress in student performance was disappointing. Perhaps next time, we’ll try using what works.

Any of these indicators could experience setbacks or reversals, but in all of modern history, it’s hard to think of cases in which, once the evidence/innovation genie is out of the bottle, it is forced back inside. Progress toward the Age of Evidence may be slower or more uneven than we’d like, but this is an idea that once planted tends to persist, and to change institutions.

If we have proven, better ways to teach reading or math or science, to increase graduation rates and college and career readiness, or to build students’ social and emotional skills and improve classroom behavior, then sooner or later policy and practice must take this evidence into account. When it does, it will kick off a virtuous cycle in which a taste for evidence among education leaders leads to substantial investments in R&D by government and the private sector. This will lead to creation and successful evaluation of better and better educational programs, which will progressively add to the taste for evidence, feeding the whole cycle.

The German philosopher Schopenhauer once said that every new idea is first ridiculed, then vehemently opposed, and then accepted as self-evident. I think we are nearing a turning point, where resistance to the idea of evidence of effectiveness as a driver in education is beginning to give way to a sense that of course any school should be using proven programs. Who would argue otherwise?

Other fields, such as medicine, agriculture, and technology, including automotive technology, long ago reached a point of no return, when innovation and evidence of effectiveness began to expand rapidly. Because education is mostly a creature of government, it has been slower to change, but change is coming. And when this point of no return arrives, we’ll never look back. As new teaching approaches, new uses of technology, new strategies for engaging students with each other, new ways of simulating scientific, mathematical, and social processes, and new ways of accommodating student differences are created, successfully evaluated, and disseminated, education will become an exciting, constantly evolving field. And no one will even remember a time when this was not the case.

In 1909, the problems of automotive engineering were daunting, but there was only one way things were going to go. True progress has no reverse gear. So it will be in education, as our Age of Evidence dawns.

This blog is sponsored by the Laura and John Arnold Foundation