“We Don’t Do Lists”

blog218_Santa_500x332 (2)

Watching the slow, uneven, uncertain rollout of the ESSA evidence standards gives me a mixture of hope and despair. The hope stems from the fact that from coast to coast, educational leaders are actually talking about proven programs and practices at all. That was certainly rare before ESSA. But despair in that I hear many educational leaders trying to find the absolute least their states and districts can do to just barely comply with the law. The ESSA evidence standards apply in particular to schools seeking school improvement funding, which are those in the lowest 5% of their states in academic performance. A previous program with a similar name but more capital letters, School Improvement, was used under NCLB, before ESSA. A large-scale evaluation by MDRC found that the earlier School Improvement made no difference in student achievement, despite billions of dollars in investments. So you’d imagine that this time around, educators responsible for school improvement would be eager to use the new law to introduce proven programs into their lowest-achieving schools. In fact, there are individual leaders, districts, and states who have exactly this intention, and may ultimately provide good examples to the rest. But they face substantial obstacles.

One of the obstacles I hear about often is an opposition among state departments of education to disseminating lists of proven programs. I very much understand and sympathize with their reluctance, as schools have been over-regulated for a long time. However, I do not see how the ESSA evidence standards can make much of a difference if everyone makes their own list of programs. Determining which studies meet ESSA evidence standards is difficult, and requires a great deal of knowledge about research (I know this, of course, because we do such reviews ourselves; see www.evidenceforessa.org).

Some say that they want programs that have been evaluated in their own states. But after taking into account demographics (e.g., urban/rural, ELL/not ELL, etc), are state-to-state differences so great as to require different research in each? We used to work with a school located on the Ohio-Indiana border, which ran right through the building. Were there really programs that were effective on one side of the building but not on the other?

Further, state department leaders frequently complain that they have too few staff to adequately manage school improvement across their states. Should that capacity be concentrated on reviewing research to determine which programs meet ESSA evidence standards and which do not?

The irony of opposing lists for ESSA evidence standards is that most states are chock full of lists that restrict the textbooks, software, and professional development schools can select using state funds. These lists may focus on paperweight, binding, and other minimum quality issues, but they almost never have anything to do with evidence of effectiveness. One state asked us to review their textbook adoption lists for reading and math, grades K-12. Collectively, there were hundreds of books, but just a handful had even a shred of evidence of effectiveness.

Educational leaders are constantly buffeted by opposing interest groups, from politicians to school board members to leaders of unions, from PTAs presidents to university presidents, to for-profit companies promoting their own materials and programs. Educational leaders need a consistent way to ensure that the decisions they make are in the best interests of children, not the often self-serving interests of adults. The ESSA evidence standards, if used wisely, give education leaders an opportunity to say to the whole cacophony of cries for special consideration, “I’d love to help you all, but we can only approve programs for our lowest-achieving schools that are known from rigorous research to benefit our children. We say this because it is the law, but also because we believe our children, and especially our lowest achievers, deserve the most effective programs, no matter what the law says.”

To back up such a radical statement, educational leaders need clarity about what their standards are and which specific programs meet those standards. Otherwise, they either have an “anything goes’ strategy that in effect means that evidence does not matter, or they have competing vendors claiming an evidence base for their favored program. Lists of proven programs can disappoint those whose programs aren’t on the list, but they are at least clear and unambiguous, and communicate to those who want to add to the list exactly what kind of evidence they will need.

States or large districts can create lists of proven programs by starting with existing national lists (such as the What Works Clearinghouse or Evidence for ESSA) and then modifying them, perhaps by adding additional programs that meet the same standards and/or eliminating programs not available in a given location. Over time, existing or new programs can be added as new evidence appears. We, at Evidence for ESSA, are willing to review programs being considered by state or local educators for addition to their own lists, and we will do it for free and in about two weeks. Then we’ll add them to our national list if they qualify.

It is important to say that while lists are necessary, they are not sufficient. Thoughtful needs assessments, information on proven programs (such as effective methods fairs and visits to local users of proven programs), and planning for high-quality implementation of proven programs are also necessary. However, students in struggling schools cannot wait for every school, district, and state to reinvent the wheel. They need the best we can give them right now, while the field is working on even better solutions for the future.

Whether a state or district uses a national list, or starts with such a list and modifies it for its own purposes, a list of proven programs provides an excellent starting point for struggling schools. It plants a flag for all to see, one that says “Because this (state/district/school) is committed to the success of every child, we select and carefully implement programs known to work. Please join us in this enterprise.”

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Advertisements

Getting Past the Dudalakas (And the Yeahbuts)

Phyllis Hunter, a gifted educator, writer, and speaker on the teaching of reading, often speaks about the biggest impediments to education improvement, which she calls the dudalakas. These are excuses for why change is impossible.  Examples are:

Dudalaka         Better students

Dudalaka         Money

Dudalaka         Policy support

Dudalaka         Parent support

Dudalaka         Union support

Dudalaka         Time

Dudalaka is just shorthand for “Due to the lack of.” It’s a close cousin of “yeahbut,” another reflexive response to ideas for improving education practices or policy.

Of course, there are real constraints that teachers and education leaders face that genuinely restrict what they can do. The problem with dudalakas and yeahbuts is not that the objections are wrong, but that they are so often thrown up as a reason not to even think about solutions.

I often participate in dudalaka conversations. Here is a composite. I’m speaking with a principal of an elementary school, who is expressing concern about the large number of students in his school who were struggling in reading. Many of these students were headed for special education. “Could you provide them with tutors?” I ask. “Yes, they get tutors, but we use a small group method that emphasizes oral reading (not the phonics skills that the students are actually lacking) (i.e., yeahbut).”

“Could you change the tutoring to focus on the skills you know students need?”

“Yeahbut our education leadership requires we use this system” (dudalaka political support). Besides, we have so many failing students (dudalaka better students) so we have to work with small groups of students (dudalaka tutors).”

“Could you hire and train paraprofessionals or recruit qualified volunteers to provide personalized tutoring?”

“Yeahbut we’d love to, but we can’t afford them (dudalaka money). Besides, we don’t have time for tutoring (dudalaka time).”

“But you have plenty of time in your afternoon schedule.”

“Yeahbut in the afternoon, children are tired. (Dudalaka better students).”

This conversation is not of course a rational discussion of strategies for solving a serious problem. It is instead an attempt by the principal to find excuses to justify his school’s continuing to do what it is doing now. Dudalakas and yeahbuts are merely ways of passing blame to other people (school leaders, teachers, children, parents, unions, and so on) and to shortages of money, time, and other resources that hold back change. Again, these excuses may or may not be valid in a particular situation, but there is a difference between rejecting potential solutions out of hand (using dudalakas and yeahbuts) as opposed to identifying and then carefully and creatively considering potential solutions. Not every solution will be possible or workable, but if the problem is important, some solution must be found. No matter what.

An average American elementary school with 500 students has an annual budget of approximately $6,000,000 ($12,000 per student). Principals and teachers, superintendents, and state superintendents think their hands are tied by limited resources (dudalaka money). But creativity and commitment to core goals can overcome funding limitations if school and district leaders are willing to use resources differently or activate underutilized resources, or ideally, find a way to obtain more funding.

The people who start off with the very human self-protective dudalakas and yeahbuts may, with time, experience, and encouragement, become huge advocates for change. It’s only natural to start with dudalakas and yeahbuts. What is important is that we don’t end with them.

We know that our children are capable of succeeding at much higher rates than they do today. Yet too many are failing, dudalaka quality implementation of proven programs. Let’s clear away the other dudalakas and yeahbuts, and get down to this one.

This blog is sponsored by the Laura and John Arnold Foundation

Pilot Studies: On the Path to Solid Evidence

This week, the Education Technology Industry Network (ETIN), a division of the Software & Information Industry Association (SIIA), released an updated guide to research methods, authored by a team at Empirical Education Inc. The guide is primarily intended to help software companies understand what is required for studies to meet current standards of evidence.

In government and among methodologists and well-funded researchers, there is general agreement about the kind of evidence needed to establish the effectiveness of an education program intended for broad dissemination. To meet its top rating (“meets standards without reservations”) the What Works Clearinghouse (WWC) requires an experiment in which schools, classes, or students are assigned at random to experimental or control groups, and it has a second category (“meets standards with reservations”) for matched studies.

These WWC categories more or less correspond to the Every Student Succeeds Act (ESSA) evidence standards (“strong” and “moderate” evidence of effectiveness, respectively), and ESSA adds a third category, “promising,” for correlational studies.

Our own Evidence for ESSA website follows the ESSA guidelines, of course. The SIIA guidelines explain all of this.

Despite the overall consensus about the top levels of evidence, the problem is that doing studies that meet these requirements is expensive and time-consuming. Software developers, especially small ones with limited capital, often do not have the resources or the patience to do such studies. Any organization that has developed something new may not want to invest substantial resources into large-scale evaluations until they have some indication that the program is likely to show well in a larger, longer, and better-designed evaluation. There is a path to high-quality evaluations, starting with pilot studies.

The SIIA Guide usefully discusses this problem, but I want to add some further thoughts on what to do when you can’t afford a large randomized study.

1. Design useful pilot studies. Evaluators need to make a clear distinction between full-scale evaluations, intended to meet WWC or ESSA standards, and pilot studies (the SIIA Guidelines call these “formative studies”), which are just meant for internal use, both to assess the strengths or weaknesses of the program and to give an early indicator of whether or not a program is ready for full-scale evaluation. The pilot study should be a miniature version of the large study. But whatever its findings, it should not be used in publicity. Results of pilot studies are important, but by definition a pilot study is not ready for prime time.

An early pilot study may be just a qualitative study, in which developers and others might observe classes, interview teachers, and examine computer-generated data on a limited scale. The problem in pilot studies is at the next level, when developers want an early indication of effects on achievement, but are not ready for a study likely to meet WWC or ESSA standards.

2. Worry about bias, not power. Small, inexpensive studies pose two types of problems. One is the possibility of bias, discussed in the next section. The other is lack of power, mostly meaning having a large enough sample to determine that a potentially meaningful program impact is statistically significant, or unlikely to have happened by chance. To understand this, imagine that your favorite baseball team adopts a new strategy. After the first ten games, the team is doing better than it did last year, in comparison to other teams, but this could have happened by chance. After 100 games? Now the results are getting interesting. If 10 teams all adopt the strategy next year and they all see improvements on average? Now you’re headed toward proof.

During the pilot process, evaluators might compare multiple classes or multiple schools, perhaps assigned at random to experimental and control groups. There may not be enough classes or schools for statistical significance yet, but if the mini-study avoids bias, the results will at least be in the ballpark (so to speak).

3. Avoid bias. A small experiment can be fine as a pilot study, but every effort should be made to avoid bias. Otherwise, the pilot study will give a result far more positive than the full-scale study will, defeating the purpose of doing a pilot.

Examples of common sources of biases in smaller studies are as follows.

a. Use of measures made by developers or researchers. These measures typically produce greatly inflated impacts.

b. Implementation of gold-plated versions of the program. . In small pilot studies, evaluations often implement versions of the program that could never be replicated. Examples include providing additional staff time that could not be repeated at scale.

c. Inclusion of highly motivated teachers or students in the experimental group, which gets the program, but not the control group. For example, matched studies of technology often exclude teachers who did not implement “enough” of the program. The problem is that the full-scale experiment (and real life) include all kinds of teachers, so excluding teachers who could not or did not want to engage with technology overstates the likely impact at scale in ordinary schools. Even worse, excluding students who did not use the technology enough may bias the study toward more capable students.

d. Learn from pilots. Evaluators, developers, and disseminators should learn as much as possible from pilots. Observations, interviews, focus groups, and other informal means should be used to understand what is working and what is not, so when the program is evaluated at scale, it is at its best.

 

***

As evidence becomes more and more important, publishers and software developers will increasingly be called upon to prove that their products are effective. However, no program should have its first evaluation be a 50-school randomized experiment. Such studies are indeed the “gold standard,” but jumping from a two-class pilot to a 50-school experiment is a way to guarantee failure. Software developers and publishers should follow a path that leads to a top-tier evaluation, and learn along the way how to ensure that their programs and evaluations will produce positive outcomes for students at the end of the process.

 

This blog is sponsored by the Laura and John Arnold Foundation

Implementing Proven Programs

There is an old joke that goes like this. A door-to-door salesman is showing a housewife the latest, fanciest, most technologically advanced vacuum cleaner. “Ma’am,” says the salesman, “this machine will do half your work!”

“Great!” says the housewife. “I’ll take two!”

All too often, when school leaders decide to adopt proven programs, they act like the foolish housewife. The program is going to take care of everything, they think. Or if it doesn’t, it’s the program’s fault, not theirs.

I wish I could tell you that you could just pick a program from our Evidence for ESSA site (launching on February 28! Next week!), wind it up, and let it teach all your kids, sort of the way a Roomba is supposed to clean your carpets. But I can’t.

Clearly, any program, no matter how good the evidence behind it is, has to be implemented with the buy-in and participation of all involved, planning, thoughtfulness, coordination, adequate professional development, interim assessment and data-based adjustments, and final assessment of program outcomes. In reality, implementing proven programs is difficult, but so is implementing ordinary unproven programs. All teachers and administrators go home every day dead tired, no matter what programs they use. The advantage of proven programs is that they hold out promise that this time, teachers’ and administrators’ efforts will pay off. Also, almost all effective programs provide extensive, high-quality professional development, and most teachers and administrators are energized and enthusiastic about engaging professional development. Finally, whole-school innovations, done right, engage the whole staff in common activities, exchanging ideas, strategies, successes, challenges, and insights.

So how can schools implement proven programs with the greatest possible chance of success? Here are a few pointers (from 43 years of experience!).

Get Buy-In. No one likes to be forced to do anything and no one puts in their best effort or imagination for an activity they did not choose.

When introducing a proven program to a school staff, have someone from the program provider’s staff come to explain it to the staff, and then get staff members to vote by secret ballot. Require an 80% majority.

This does several things. First, it ensures that the school staff is on board, willing to give the program their best shot. Second, it effectively silences the small minority in every school that opposes everything. After the first year, additional schools that did not select the program in the first round should be given another opportunity, but by then they will have seen how well the program works in neighboring schools.

Plan, Plan, Plan. Did you ever see the Far Side cartoon in which there is a random pile of horses and cowboys and a sheriff says, “You don’t just throw a posse together, dadgummit!” (or something like that). School staffs should work with program providers to carefully plan every step of program introduction. The planning should focus on how the program needs to be adapted to the specific requirements of this particular school or district, and make best use of human, physical, technological, and financial resources.

Professional Development. Perhaps the most common mistake in implementing proven programs is providing too little on-site, up-front training, and too little on-site, ongoing coaching. Professional development is expensive, especially if travel is involved, and users of proven programs often try to minimize costs by doing less professional development, or doing all or most of it electronically, or using “trainer-of-trainer” models (in which someone from the school or district learns the model and then teaches it to colleagues).

Here’s a dark secret. Developers of proven programs almost never use any of these training models in their own research. Quite the contrary, they are likely to have top-quality coaches swarming all over schools, visiting classes and ensuring high-quality implementation any way they can. Yet when it comes time for dissemination, they keep costs down by providing much, much less than what was needed (which is why they provided it in their studies). This is such a common problem that Evidence for ESSA excludes programs that used a lot of professional development in their research, but today just send an online manual, for example. Evidence for ESSA tries to describe dissemination requirements in terms of what was done in the research, not what is currently offered.

Coaching. Coaching means having experts visit teachers’ classes and give them individual or schoolwide feedback on their quality of implementation.

Coaching is essential because it helps teachers know whether they are on track to full implementation, and enables the project to provide individualized, actionable feedback. If you question the need for feedback, consider how you could learn to play tennis or golf, play the French horn, or act in Shakespearean plays, if no one ever saw you do it and gave you useful and targeted feedback and suggestions for improvement. Yet teaching is much, much more difficult.

Sure, coaching is expensive. But poor implementation squanders not only the cost of the program, but also teachers’ enthusiasm and belief that things can be better.

Feedback. Coaches, building facilitators, or local experts should have opportunities to give regular feedback to schools using proven programs, on implementation as well as outcomes. This feedback should be focused on solving problems together, not on blaming or shaming, but it is essential in keeping schools on track toward goals. At the end of each quarter or at least annually, school staffs need an opportunity to consider how they are doing with a proven program and how they are going to make it better.

Proven programs plus thoughtful, thorough implementation are the most powerful tool we have to make a major difference in student achievement across whole schools and districts. They build on the strengths of schools and teachers, and create a lasting sense of efficacy. A team of teachers and administrators that has organized itself around a proven program, implemented it with pride and creativity, and saw enhanced outcomes, is a force to be reckoned with. A force for good.

Scaling Up: Penicillin and Education

In 1928, the Scottish scientist Alexander Fleming invented penicillin. As the story goes, he invented penicillin by accident, when he left a petri dish containing bacteria on his desk overnight and the next morning found that it was infected with rod-shaped organisms that had killed the bacteria. Fleming isolated the rods and recognized that if they could kill bacteria, they might be useful in curing many diseases.

Early on it was clear that penicillin had extraordinary possibilities. In World War I, more soldiers and civilians had been killed by bacterial diseases than were killed by bullets. What if these diseases could be cured? Early tests showed very promising effects.

Yet there was a big problem. No one knew how to produce penicillin in quantity. Very small experiments established that penicillin had potential for curing bacterial infections and was not toxic. However, the total world supply at the onset of World War II was about enough for a single adult. The impending need for penicillin was obvious, but it still was not ready for prime time.

American and British scientists finally began to work together to find a way to scale up production of penicillin. Finally, the Merck Company developed a mass production method, and was making billions of units by D-Day.

The key dynamic of the penicillin story has much in common with an essential problem of education reform. The Merck work did not change the structure of penicillin itself, but Merck scientists did a lot of science and experimentation to find strains that were stable and replicable. In education reform, it is equally the case that the development and initial evaluation of a given program may be a very different process from that intended to carry out large-scale evaluations and scaling up of proven programs.

In some cases, different organizations may be necessary to do large scale evaluation and implementation, as was the case with Merck and Fleming, and in other cases the same organization may carry though the development, initial evaluation, large-scale evaluation, and dissemination. Whoever is responsible for the various steps, their requirements are similar.

At small scale, innovators are likely to work in schools nearby, where they can frequently visit schools, see what is going on, hear teachers’ perspectives, and change strategies in course in response to what is going on. At small scale, programs might vary a great deal from class to class or school to school. Homemade measures, opinions, observations, and other informal indicators may be all developers need or want. From a penicillin perspective, this is still the Fleming level.

When a program moves to the next level, it may be working in many schools or distant locations, and the approach must change substantially. This is the Merck stage of development in penicillin terms. Developers must have a very clear idea of what the program is, and then provide student materials, software, professional development, and coaching directed toward helping teachers to enact the program effectively. Rather than being able to adapt a great deal to the desires or ideas of every school or teacher, principals and teachers can be asked to vote on participation, with an understanding that if they decide to participate, they commit to follow the program more or less as designed, with reasonable variations in light of unique characteristics of the school (e.g., urban/rural, presence of English learners, or substantial poverty). Professional development and coaching need to be standardized, with room for appropriate adaptations. Organizations that provide large-scale services need to learn how to manage functions such as finance, human resources, and IT.

As programs grow, they should seek funding for large-scale, randomized evaluations, ideally by third party evaluators.

In order to get to the Merck level in education reform, we must be ready to build robust, flexible, self-sustaining organizations, capable of ensuring positive impacts of educational programs on a broad scale. Funding from government and private foundations are needed along the way, but the organizations ultimately must be able to operate mostly or entirely on revenues from schools, especially Title I or other funds likely to be available in many or most schools.

Over the years, penicillin has saved millions of lives, due to the pioneering work of Fleming and the pragmatic work of Merck. In the same way, we can greatly enhance the learning of millions of children, combining innovative design and planful, practical scale-up.

Perfect Implementation of Hopeless Methods: The Sinking of the Vasa

If you are ever in Stockholm, you must visit the Vasa Museum. It contains a complete warship launched in 1628 that sank 30 minutes later. Other than the ship itself, the museum contains objects and bones found in the wreck, and carefully analyzed by scientists.

The basic story of the sinking of the Vasa has important analogies to what often happens in education reform.

After the Vasa sank, the king, who commissioned it, Gustav II Adolphe, called together a commission to find out whose fault it was, to punish the guilty.

Yet the commission, after many interviews with survivors, found that no one did anything wrong. 3 ½ centuries later, modern researchers came to the same conclusion. Everything was in order. The skeleton of the helmsman was found still gripping the steering pole, trying heroically to turn the ship’s bow into the wind to keep it from leaning over.

So what went wrong? The ship could never have sailed. It was built too top-heavy, with too much heavy wood and too many heavy guns on the top decks and too little ballast on the bottom. The Vasa was doomed, no matter what the captain and crew did.

In education reform, there is a constant debate about how much is contributed to effectiveness by a program as opposed to quality of implementation. In implementation science, there are occasionally claims that it does not matter what programs schools adopt, as long as they implement them well. But most researchers, developers, and educators agree that success only results from a combination of good programs and good implementation. Think of the relationship as multiplicative:

P X I = A

(Quality of program times quality of implementation equals achievement gain).

The reason the relationship might be multiplicative is that if either P or I is zero, achievement gain is zero. If both are very positive, then achievement gain is very, very positive.

In the case of the Vasa, P=0, so no matter how good implementation was, the Vasa was doomed. In many educational programs, the same is true. For example, programs that are not well worked out, not well integrated into teachers’ schedules and skill sets, or are too difficult to implement, are unlikely to work. One might argue that in order to have positive effects, a program must be very clear about what teachers are expected to do, so that professional development and coaching can be efficiently targeted to helping teachers do those things. Then we have to have evidence that links teachers’ doing certain things to improving student learning. For example, providing teachers with professional development to enhance their content knowledge may not be helpful if teachers are not clear how to put this new knowledge into their daily teaching.

Rigorous research, especially under funding from IES and i3 in the U.S. and from EEF in England, is increasingly identifying proven programs as well as programs that consistently fail to improve student outcomes. The patterns are not perfectly clear, but in general those programs that do make a significant difference are ones that are well-designed, practical, and coherent.

If you think implementation alone will carry the day, keep in mind the skeleton of the heroic helmsman of the Vasa, spending 333 years on the seafloor trying to push the Vasa’s bow into the wind. He did everything right, except for signing on to the wrong ship.

The Sailor and the Sailboat: Leadership and Evidence

My one extravagance is that I live on the Chesapeake Bay and have a small sailboat. I love to sail, even if I’m not especially good at it, but sailing small boats teaches you a lot of important life lessons.

One of these lessons is that leadership is crucial, but leadership can only make a difference if leaders have the tools to translate leadership into outcomes.

Here’s what I mean from a sailing perspective. A sailboat is just a hull, sails, lines, a mast, a rudder, and a centerboard. When these are all in good working order, it still takes a good sailor to manage a small sailboat in heavy weather. However, when any one component is lacking, all hell breaks loose. For example, on my 11-foot sailboat, sometimes the rudder falls off in rough water. Without a rudder, it doesn’t matter how good a sailor you are. You aren’t going anywhere. Similarly, we once lost a mast in a heavy wind. Yikes!

Principals and superintendents in Title I schools are a lot like small-boat sailors in heavy weather, every single day. If all the structures and supports are in place, and if they have a great crew, capable school or district leaders can do wonders for their children.

Proven programs do not manage schools on their own. What they do is help provide the sails, mast, rudder, and lines known to work effectively with a good captain and crew.

Sometimes I hear educational leaders dismiss the importance of proven programs, saying that the only thing that matters is good leadership. But this is only half right. Great leadership is essential to make proven programs work, but proven, replicable programs and other infrastructure are equally essential to enable great leaders to have great results with kids.

So yes, recruit the best captains you can, and mentor them as much as possible. But give them, or enable them to acquire, sailboats known to work. Too many potentially great captains are given sailboats lacking a rudder or mast. When this happens, they’re sunk from the beginning.