Luther Burbank and Evidence in Education

The first house my wife and I owned was a corner rowhouse in Baltimore. The house was small and the yard was small, but there was a long fenceline with no trees overhead. We decided to put in an orchard. By the time we were done, we’d planted apples, pears, peaches, cherries, Italian and Santa Rosa plums, blueberries, and Concord grapes. Some worked out better than others, but at harvest season we were picking and canning a lot of fruit.

My involvement with our tiny orchard led me to find out about Luther Burbank, the botanist who developed many of the fruit varieties we know today in the late 1800s. He and later botanists over the years developed a cornucopia of fruits, vegetables, and flowers of all kinds.

Burbank had nothing to do with educational research, as far as I know, but the process he developed to create and test many fruit varieties has lessons for us in education.

Burbank’s better-tasting or hardy-growing or heat-tolerant varieties enabled fruit to improve dramatically in diversity and quality and to diminish in cost. All to the good. Some of the new fruits were enthusiastically adopted by farmers, because they knew their customers would buy them. Some did not work out, because they were not so tasty, difficult or expensive to grow, or hard to ship. But the ones that did work out, like the delicious Santa Rosa plums we grew in profusion in Baltimore, changed the world. Burbank developed the Russet potato, for example, which rescued Ireland and the rest of Europe from the potato famine.

Now imagine that Burbank’s fruit trees were instead treated like new educational programs. Opponents of innovative fruits would try to get governments to ban them. Proponents might try to get governments to require them. Governments themselves might try to regulate them.

As a result, fruit tree development might have withered or died on the vine.

In education, we need to adopt the approaches agriculture has used since the time of Benjamin Franklin to promote ever-better seeds, varieties, and techniques. Government, publishers, software developers, and others should be in a constant process of creating and evaluating effective methods. Governments should set standards for evaluation as well as funding a great deal of it. When proven programs exist, government at all levels should help make educators aware of the programs and the evidence, much as agricultural extension agents do with farmers.

What government should not do is require schools or districts to adopt particular programs. Instead, they should provide information and incentives, but leave the choices up to the schools. Agricultural extension agents tell farmers about new research, but it is up to them to use it or not. If they choose not to do so but their neighbors do, and their neighbors get bigger yields and higher profits, they are likely to change their minds soon enough.

Similarly, government should not limit the creativity and ideas that are being explored in order to promote one particular design. Innovations should be field driven and address a broad range of issues in different ways to discover what works. Imagine if Burbank and his colleagues were only permitted to experiment with one variety of produce. What might have happened if the Russet potato had never been discovered?

In education, government needs to jumpstart research, development, and dissemination, and it needs to honestly present the evidence and provide resources for educators to use to adopt and perhaps further test innovations. Burbank’s brilliant hybrids would have been local curiosities if the Stark Seed Company had not provided, well, seed funding and marketing support. Changing metaphors, government needs to provide the field, the ball, the rules, and serve as referee and cheerleader, but then let the teams compete in the full light of public view.

America’s students can become the best in the world, if we use the same strategies that have made it strong economically. Create policies favoring innovation and use of proven programs and then stand back. That’s all Luther Burbank needed to revolutionize fruit tree production, and it’s all educational research and development needs to transform teaching and learning.

Raising the Bar

Back in 2004-2009, our group at Johns Hopkins University had a research center called the Center for Data-Driven Reform in Education, or CDDRE (pronounced to rhyme with “padre”). The purpose of CDDRE was to develop, evaluate, and disseminate a strategy to assist states and districts to help schools learn about, select, and implement proven programs. The project was huge. We ultimately evaluated CDDRE in 608 elementary and secondary schools in 59 districts in seven states. We partnered with the Council of Chief State School Officers (CCSSO) to help us form these partnerships.

Our idea was simple. We worked with state, district, and school leaders to help them understand what proven programs in education were and how they could help schools serving many disadvantaged students to become more successful. We called our model “Raising the Bar.” In it, we first helped school leaders work through a needs assessment and then helped them select a proven program aligned with their needs. Then we helped them plan for implementation of the program they chose (if any).

This common-sense and inexpensive intervention over four years had substantial impacts. Schools were randomly assigned to Raising the Bar or control conditions. The ones assigned to Raising the Bar had much higher scores.

Not all schools actually selected and implemented programs. All services to the states and districts were free. But schools had to pay to adopt programs, which is why many did not. However, reading outcomes were much greater for schools that did select a reading program than for those that did not.

You might expect that there would be a mad dash by states, districts, and schools to adopt a program that made a substantial difference in achievement for very little money, but this was not the case. The time for evidence-based reform had not yet arrived. There were too few proven programs (most schools that did adopt something adopted our own Success for All reading program, though we tried hard to connect them with other whole-school approaches).

Today, however, things are looking up for evidence. The Every Student Succeeds Act (ESSA) defines four levels of evidence and encourages use of programs with evidence throughout the legislation, in some places encouraging or requiring use of only the top three levels (strong, moderate, and promising). There are far more programs of all kinds to choose from, and both the What Works Clearinghouse and our recently launched Evidence for ESSA make it much easier than it was to locate proven programs appropriate to schools’ needs. ESSA’s requirement that schools allocate school improvement funding for programs meeting the top three standards, and incentives embedded in other funding programs, have made educational leaders at all levels interested in learning about proven programs and the whole process of innovation and evaluation.

As a consequence of this growing interest and capacity, we are now planning to update and re-launch Raising the Bar. Our hope is that this will help state and district leaders interested in going beyond compliance with federal mandates to make effective use of the new wind in the sails of evidence-based reform to drive innovation and improvement in their most vulnerable schools.

Winston Churchill once said that the United States will always do the right thing, after it has exhausted all the other alternatives. Can we all agree that evidence-based reform is the right thing? And can we agree that we’ve tried just about everything else? What we have been doing is trying all sorts of untested policies from the top, and only then commissioning evaluations that almost invariably find disappointing outcomes. Could it at long last be time for us to start with what works, and then use policy levers to scale it up?

You Can Step Twice in the Same River: Systems in Education

You can never step twice in the same river.  At least that is what Greek philosopher Heraclitus said a long time ago, when Socrates was just a pup.  What he meant, of course, was that a river is constantly changing, for reasons large and small, so the river you waded across yesterday, or even a minute ago, is not the same one you wade in now.

This proposition is both obvious and wrong.  Sure, rivers are never 100% the same.  But does it matter?  Imagine, for example, that you somehow drained all the water out of a river.  Within a few days or weeks, it would entirely revive itself.  The reason is that a river is not a “thing.”  It is a system.  In other words, a river exists because there is a certain level of rainfall or groundwater or water from upstream, and then a certain topography (rivers are in low-lying areas, compared to surrounding land).  Those factors create the river, and as long as they exist, the river exists.  So when you wade into a river, you are wading into a system, and (sorry, Heraclitus) it is always the same system, because even if the river is higher or lower or muddier or clearer than usual, the system is always the same, unless something pretty dramatic happens upstream.

So why am I rattling on about rivers?  The point I hope to make is that genuine and lasting change in a school depends on changing the system in which the school operates, not just small parts of the school that will be swept away if the system stays unchanged.

Here’s what I mean from an education reform perspective.  Teachers’ daily practices in classrooms are substantially determined by powerful systems.  Whatever innovations you introduce in a school, no matter how effective in the short term, will be eliminated and forgotten if the rest of the system does not change.  For example, if a school implements a great new math program but does not solve classroom management or attendance problems, the school may not maintain its math reform.  Lasting change in math, for example, might require attending to diversity in achievement levels by providing effective tutoring or small-group assistance.  It might require providing eyeglasses to children who need them.  It might require improving reading performance as well as math.  It might require involving parents.  It might require constant monitoring of students’ math performance and targeted responses to solve problems.  It might require recruiting volunteers, or making good use of after school or summer time.  It might require mobilizing department heads or other math leaders within the school to support implementation, and to help maintain the effective program when (predictable) turmoil threatens it.  Policy changes at the district, state, and national levels may also help, but I’m just focusing for the moment on aspects of the system that an individual school or district can implement on its own.  Attending to all of these factors at once may increase the chances that in five or ten years, the effective program remains in place and stays effective, even if the original principal, department head, teachers, and special funds are no longer at the school.

It’s not that every school has to do all of these things to improve math performance over time, but I would argue that lasting impact will depend on some constellation of supports that change the system in which the math reform operates.  Otherwise, the longstanding system of the school will return, washing away the reform and taking the school back to its pre-reform behaviors and policies.

A problem in all of this is that educational development and research often work against systemic change.  In particular, academic researchers are rewarded for publishing articles, and it helps if they evaluate approaches that purely represent a given theory.  Pragmatically, an approach with many components may be more expensive and more difficult to put in place.  As a result, a lot of proven programs available to educators are narrow, focused on the main objective but not on the broader system of the school.  This may be fine in the short run, but in the long run the narrowly focused treatment may not maintain over time.

Seen as a system, a river will never change its course until the key elements that determine its course themselves change.  Unless that happens, we’ll always be stepping into the same river, over and over again, and getting the same results.

On Meta-Analysis: Eight Great Tomatoes

I remember a long-ago advertisement for Contadina tomato paste. It went something like this:

Eight great tomatoes in an itsy bitsy can!

This ad creates an appealing image, or at least a provocative one, that I suppose sold a lot of tomato paste.

In educational research, we do something a lot like “eight great tomatoes.” It’s called meta-analysis, or systematic review.  I am particularly interested in meta-analyses of experimental studies of educational programs.  For example, there are meta-analyses of reading and math and science programs.  I’ve written them myself, as have many others.  In each, some number of relevant studies are identified.  From each study, one or more “effect sizes” are computed to represent the impact of the program on important outcomes, such as scores on achievement tests. These are then averaged to get an overall impact for each program or type of program.  Think of the effect size as boiling down tomatoes to make concentrated paste, to fit into an itsy bitsy can.

But here is the problem.  The Contadina ad specifies eight great tomatoes. If even one tomato is instead a really lousy one, the contents of the itsy bitsy can will be lousy.  Ultimately, lousy tomato pastes would bankrupt the company.

The same is true of meta-analyses.  Some meta-analyses include a broad range of studies – good, mediocre, and bad.  They may try to statistically control for various factors, but this does not do the job.  Bad studies lead to bad outcomes.  Years ago, I critiqued a study of “class size.”  The studies of class size in ordinary classrooms found small effects.  But there was one study that involved teaching tennis.  In small classes, the kids got a lot more court time than did kids in large classes.  This study, and only this study, found substantial effects of class size, significantly affecting the average.  There were not eight great tomatoes, there was at least one lousy tomato, which made the itsy bitsy can worthless.

The point I am making here is that when doing meta-analysis, the studies must be pre-screened for quality, and then carefully scrubbed.  Specifically, there are many factors that greatly (and falsely) inflate effect size.  Examples include use of assessments made by the researchers and ones that assess what was taught in the experimental group but not the control group, use of small samples, and provision of excessive assistance to the teachers.

Some meta-analyses just shovel all the studies onto a computer and report an average effect size.  More responsible ones shovel the studies into a computer and then test for and control for various factors that might affect outcomes. This is better, but you just can’t control for lousy studies, because they are often lousy in many ways.

Instead, high-quality meta-analyses set specific criteria for inclusion intended to minimize bias.  Studies often use both valid measures and crummy measures (such as those biased toward the experimental group).  Good meta-analyses use the good measures but not the (defined in advance) crummy ones.  Studies that only used crummy measures are excluded.  And so on.

With systematic standards, systematically applied, meta-analyses can be of great value.  Call it the Contadina method.  In order to get great tomato paste, start with great tomatoes. The rest takes care of itself.