Education Innovation and Research: Innovating Our Way to the Top

How did America get to be the wealthiest and most powerful country on Earth?

To explain, let me tell you about visiting a remote mountain village in Slovakia. I arrived in the evening, as the ancient central square filled up with people. Every man, woman, and child had a cell phone. Invented in America.

In the local hospital, I’m sure that most medicines were invented in America, which does more medical research than all other nations combined. Local farmers probably planted seeds and used methods developed in the U.S. Everywhere in the world, everyone watches American movies, listens to American music, and on and on.

America’s brand, the source of our wealth, is innovation.

America has long led the world in creating wealth by creating new ideas and putting them in practice. Technology? Medicine? Agriculture? America dominates the world in each of these fields, and many more. The reason is that America innovates, constantly finding new ways to solve problems, cure diseases, grow better crops, and generally do things less expensively. I am often at Johns Hopkins Hospital, where the halls are full of patients from every part of the globe. They come to Johns Hopkins because of its reputation for innovation.

In education, we face daunting problems, especially in educating disadvantaged students. So to solve these problems, you’d naturally expect that we’d turn to the principle that has led to our success in so many fields – innovation.

The Every Student Succeeds Act (ESSA), passed by Congress and signed into law in December, 2015, has taken just this view. In it, for the first time ever, is a definition of the evidence required for a program or practice to be considered “strong,” “moderate,” or “promising.” These definitions encourage educators to adopt proven programs, but for this to work, we have to have a steady stream of proven innovations appearing each year. This function is fulfilled by another part of ESSA, the Education Innovation and Research (EIR) grant program. The EIR provision, which was included in ESSA with bipartisan support, provides a tiered evidence approach to research that will constantly add to the body of programs that meet the ESSA evidence requirements. Proposals are invited for “early phase,” “mid-phase,” and “expansion” grants to support the development, validation, and scale-up of successful innovations that originate at the state and local levels. Based on the U.S. Department of Education’s recent EIR grant application process, it appears (as is expected from a tiered evidence design) that lots of early stage grants of up to $3 million will be made, fewer mid-stage grants of up to $8 million, and very few expansion grants of up to $15 million, all over 5 years. Anyone can apply for an early-stage grant, but applicants must already have some evidence to support their program to get a mid-stage grant, and a lot of very rigorous evidence to apply for an expansion grant. All three types of grants require third-party evaluations – which will serve to improve programs all along the spectrum of effectiveness – but mid-stage and expansion grants require large, randomized evaluations, and expansion grants additionally require national dissemination.

The structure of EIR grants is intended to make the innovation process wide open to educators at all levels of state and local governments, non-profits, businesses, and universities. It is also designed to give applicants the freedom to suggest the nature of the program they want to create, thus allowing for a broad range of field-driven ideas that arise to meet recognized needs. EIR does encourage innovation in rural schools, which must receive at least 25% of the funding, but otherwise there is considerable freedom, drawing diverse innovators to the process.

EIR is an excellent investment. If only a few of the programs it supports end up showing positive outcomes and scaling up to serve many students across the U.S., then EIR funding will make a crucial difference to the educational success of hundreds of thousands or millions of students, improving outcomes on a scale that matters at modest cost.

EIR provides an opportunity for America to solve its education problems just as it has solved problems in many other fields: through innovation. That is what America does when it needs rapid and widespread success, as it so clearly does in education. In every subject and grade level, we can innovate our way to the top. EIR is providing the resources and structure to do it.

Evidence and Freedom

One of the strangest arguments I hear against evidence-based reform in education is that encouraging or incentivizing schools to use programs or practices proven to work in rigorous experiments will reduce the freedom of schools to do what they think is best for their students.

Freedom? Really?

To start with, consider how much freedom schools have now. Many districts and state departments of education have elaborate 100% evidence-free processes of restricting the freedom of schools. They establish lists of approved providers of textbooks, software, and professional development, based perhaps on state curriculum standards but also on current trends, fads, political factors, and preferences of panels of educators and other citizens. Many states have textbook adoption standards that consider paper weight, attractiveness, politically correct language, and other surface factors, but never evidence of effectiveness. Federal policies specify how teachers should be evaluated, how federal dollars should be utilized, and how students should be assessed. I could go on for more pages than anyone wants to read with examples of how teachers’ and principals’ choices are constrained by district, state, and federal policies, very few of which have ever been tested in comparison to control groups. Why do schools use this textbook or that software or the other technology? Because their district or state bought it for them, trained them in its use (perhaps), and gave them no alternative.

The evidence revolution offers the possibility of freedom, if the evidence now becoming widely available is used properly. The minimum principle of evidence-based reform should be this: “If it is proven to work, you are allowed to use it.”

At bare minimum, evidence of effectiveness should work as a “get out of jail free” card to counter whatever rules, restrictions, or lists of approved materials schools have been required to follow.

But permission is not enough, because mandated, evidence-free materials, software, and professional development may eat up the resources needed to implement proven programs. So here is a slightly more radical proposition: “Whenever possible, school staffs should have the right, by majority vote of the staff, to adopt proven programs to replace current programs mandated by the district or state.”

For example, when a district or state requires use of anything, it could make the equivalent in money available to schools to use to select and implement programs proven to be effective in producing the desired outcome. If the district adopts a new algebra text or elementary science curriculum, for instance, it could allow schools to select an alternative with good evidence of effectiveness for algebra or elementary science, as long as the school agrees to implement the program with fidelity and care, achieving levels of implementation like those in the research that validated the program.

The next level of freedom to choose what works would be to provide incentives and support for schools that select proven programs and promise to implement them with fidelity.

“Schools should be able to apply for federal, state, or local funds to implement proven programs of their choice. Alternatively, they may receive competitive preference points on grants if they promise to adopt and effectively implement proven programs.”

This principle exists today in the Every Student Succeeds Act (ESSA), where schools applying for school improvement funding must select programs that meet one of three levels of evidence: strong (at least one randomized experiment with positive outcomes), moderate (at least one quasi-experimental [matched] study with positive outcomes), or promising (at least one correlational study with positive outcomes). In seven other programs in ESSA, schools applying for federal funds receive extra competitive preference points on their applications if they commit to using programs that meet one of those three levels of evidence. The principle in ESSA – that use of proven programs should be encouraged – should be expanded to all parts of government where proven programs exist.

One problem with these principles is that they depend on having many proven programs in each area from which schools can choose. At least in reading and math, grades K-12, this has been accomplished; our Evidence for ESSA website describes approximately 100 programs that meet the top three ESSA evidence standards. More than half of these meet the “strong” standard.

However, we must have a constant flow of new approaches in all subjects and grade levels. Evidence-based policy requires continuing investments in development, evaluation, and dissemination of proven programs. The Institute of Education Sciences (IES), the Investing in Innovation (i3) program, and now the Education Innovation and Research (EIR) grant program, help fulfill this function, and they need to continue to be supported in their crucial work.

So is this what freedom looks like in educational innovation? I would argue that it does. Note that what I did not say is that programs lacking evidence should be forbidden. Mandating use of programs, no matter how well evaluated, is a path to poor implementation and political opposition. Instead, schools should have the opportunity and the funding to adopt proven programs. If they prefer not to do so, that is their choice. But my hope and expectation is that in a political system that encourages and supports use of proven programs, educators will turn out in droves to use better programs, and the schools that might have been reluctant at first will see and emulate the success their neighbors are having.

Freedom to use proven programs should help districts, states, and the federal government have confidence that they can at long last stop trying to micromanage schools. If policymakers know that schools are making good choices and getting good results, why should they want to get in their way?

Freedom to use whatever is proven to enhance student learning. Doesn’t that have a nice ring to it? Like the Liberty Bell?

Luther Burbank and Evidence in Education

The first house my wife and I owned was a corner rowhouse in Baltimore. The house was small and the yard was small, but there was a long fenceline with no trees overhead. We decided to put in an orchard. By the time we were done, we’d planted apples, pears, peaches, cherries, Italian and Santa Rosa plums, blueberries, and Concord grapes. Some worked out better than others, but at harvest season we were picking and canning a lot of fruit.

My involvement with our tiny orchard led me to find out about Luther Burbank, the botanist who developed many of the fruit varieties we know today in the late 1800s. He and later botanists over the years developed a cornucopia of fruits, vegetables, and flowers of all kinds.

Burbank had nothing to do with educational research, as far as I know, but the process he developed to create and test many fruit varieties has lessons for us in education.

Burbank’s better-tasting or hardy-growing or heat-tolerant varieties enabled fruit to improve dramatically in diversity and quality and to diminish in cost. All to the good. Some of the new fruits were enthusiastically adopted by farmers, because they knew their customers would buy them. Some did not work out, because they were not so tasty, difficult or expensive to grow, or hard to ship. But the ones that did work out, like the delicious Santa Rosa plums we grew in profusion in Baltimore, changed the world. Burbank developed the Russet potato, for example, which rescued Ireland and the rest of Europe from the potato famine.

Now imagine that Burbank’s fruit trees were instead treated like new educational programs. Opponents of innovative fruits would try to get governments to ban them. Proponents might try to get governments to require them. Governments themselves might try to regulate them.

As a result, fruit tree development might have withered or died on the vine.

In education, we need to adopt the approaches agriculture has used since the time of Benjamin Franklin to promote ever-better seeds, varieties, and techniques. Government, publishers, software developers, and others should be in a constant process of creating and evaluating effective methods. Governments should set standards for evaluation as well as funding a great deal of it. When proven programs exist, government at all levels should help make educators aware of the programs and the evidence, much as agricultural extension agents do with farmers.

What government should not do is require schools or districts to adopt particular programs. Instead, they should provide information and incentives, but leave the choices up to the schools. Agricultural extension agents tell farmers about new research, but it is up to them to use it or not. If they choose not to do so but their neighbors do, and their neighbors get bigger yields and higher profits, they are likely to change their minds soon enough.

Similarly, government should not limit the creativity and ideas that are being explored in order to promote one particular design. Innovations should be field driven and address a broad range of issues in different ways to discover what works. Imagine if Burbank and his colleagues were only permitted to experiment with one variety of produce. What might have happened if the Russet potato had never been discovered?

In education, government needs to jumpstart research, development, and dissemination, and it needs to honestly present the evidence and provide resources for educators to use to adopt and perhaps further test innovations. Burbank’s brilliant hybrids would have been local curiosities if the Stark Seed Company had not provided, well, seed funding and marketing support. Changing metaphors, government needs to provide the field, the ball, the rules, and serve as referee and cheerleader, but then let the teams compete in the full light of public view.

America’s students can become the best in the world, if we use the same strategies that have made it strong economically. Create policies favoring innovation and use of proven programs and then stand back. That’s all Luther Burbank needed to revolutionize fruit tree production, and it’s all educational research and development needs to transform teaching and learning.

Scaling Up: Penicillin and Education

In 1928, the Scottish scientist Alexander Fleming invented penicillin. As the story goes, he invented penicillin by accident, when he left a petri dish containing bacteria on his desk overnight and the next morning found that it was infected with rod-shaped organisms that had killed the bacteria. Fleming isolated the rods and recognized that if they could kill bacteria, they might be useful in curing many diseases.

Early on it was clear that penicillin had extraordinary possibilities. In World War I, more soldiers and civilians had been killed by bacterial diseases than were killed by bullets. What if these diseases could be cured? Early tests showed very promising effects.

Yet there was a big problem. No one knew how to produce penicillin in quantity. Very small experiments established that penicillin had potential for curing bacterial infections and was not toxic. However, the total world supply at the onset of World War II was about enough for a single adult. The impending need for penicillin was obvious, but it still was not ready for prime time.

American and British scientists finally began to work together to find a way to scale up production of penicillin. Finally, the Merck Company developed a mass production method, and was making billions of units by D-Day.

The key dynamic of the penicillin story has much in common with an essential problem of education reform. The Merck work did not change the structure of penicillin itself, but Merck scientists did a lot of science and experimentation to find strains that were stable and replicable. In education reform, it is equally the case that the development and initial evaluation of a given program may be a very different process from that intended to carry out large-scale evaluations and scaling up of proven programs.

In some cases, different organizations may be necessary to do large scale evaluation and implementation, as was the case with Merck and Fleming, and in other cases the same organization may carry though the development, initial evaluation, large-scale evaluation, and dissemination. Whoever is responsible for the various steps, their requirements are similar.

At small scale, innovators are likely to work in schools nearby, where they can frequently visit schools, see what is going on, hear teachers’ perspectives, and change strategies in course in response to what is going on. At small scale, programs might vary a great deal from class to class or school to school. Homemade measures, opinions, observations, and other informal indicators may be all developers need or want. From a penicillin perspective, this is still the Fleming level.

When a program moves to the next level, it may be working in many schools or distant locations, and the approach must change substantially. This is the Merck stage of development in penicillin terms. Developers must have a very clear idea of what the program is, and then provide student materials, software, professional development, and coaching directed toward helping teachers to enact the program effectively. Rather than being able to adapt a great deal to the desires or ideas of every school or teacher, principals and teachers can be asked to vote on participation, with an understanding that if they decide to participate, they commit to follow the program more or less as designed, with reasonable variations in light of unique characteristics of the school (e.g., urban/rural, presence of English learners, or substantial poverty). Professional development and coaching need to be standardized, with room for appropriate adaptations. Organizations that provide large-scale services need to learn how to manage functions such as finance, human resources, and IT.

As programs grow, they should seek funding for large-scale, randomized evaluations, ideally by third party evaluators.

In order to get to the Merck level in education reform, we must be ready to build robust, flexible, self-sustaining organizations, capable of ensuring positive impacts of educational programs on a broad scale. Funding from government and private foundations are needed along the way, but the organizations ultimately must be able to operate mostly or entirely on revenues from schools, especially Title I or other funds likely to be available in many or most schools.

Over the years, penicillin has saved millions of lives, due to the pioneering work of Fleming and the pragmatic work of Merck. In the same way, we can greatly enhance the learning of millions of children, combining innovative design and planful, practical scale-up.

Brilliant Errors

On a recent visit to Sweden, my wife Nancy and I went to the lovely university city of Uppsala. There, one of the highlights of our trip was a tour of the house and garden of the great 18th century botanist, Carl Linnaeus, who invented the system of naming plants and animals we use today. Whenever we say Homo Sapiens, for example, we are honoring Linnaeus. His system uses two Latin words, first the genus and then the species. This replaced long, descriptive, but non-standardized naming systems that made it difficult to work out the relationships among plants and animals. Linnaeus was the most famous botanist of his time, and he is generally considered the most famous botanist in all of history. He wrote hundreds of books and papers, and he inspired the work of generations of botanists and biologists to follow, right up to today.

But he was dead wrong.

What Linnaeus was primarily trying to do was to create a comprehensive system to organize plants by their characteristics. In this, he developed what he called a “sexual system,” emphasizing the means by which plants reproduce. This was a reasonable guess, but later research showed that his organization system was incorrect.

But the fact that his specific model was wrong does not subtract one mustard seed from the power and importance of Linnaeus’ contribution.

Linnaeus’ lasting contribution was in his systematic approach, carefully analyzing plants to observe similarities and differences. Before Linnaeus, botany involved discovery, description, and categorization of plants, but there was no overarching system of relationships, and no scientifically useful naming system to facilitate seeing relationships.

The life and work of Linnaeus provides an interesting case for educators and educational research.

Being wrong is not shameful, as long as you can learn from your errors. In the history of education, the great majority of research began with a set of assumptions, but research methods did not adequately test these assumptions. There was an old saying that all educational research was “doomed to success.” As a result, we had little ability to tell when theories or methods were truly impactful, and when they were not. For this reason, it was rarely possible to learn from errors, or even from apparent successes.

In recent years, the rise of experimental research, in real schools over real periods of time measured by real assessments, has produced a growing set of proven replicable programs, and this is crucial for improving practice right now. But in the longer run, using methods that also identify failures or incorrect or unrealistic ideas is just as important. In the absence of methods that can disconfirm current beliefs, nothing ever changes.

It is becoming apparent that most large-scale randomized experiments in education fail to produce statistically significant outcomes on achievement. We can celebrate and replicate those that do make a significant difference in students’ learning, but we can also learn from those that do not. Often, studies find no difference overall but do find positive effects for particular subgroups, or when particular forms of a program are used, or when implementation meets a high standard. These after-the-fact findings provide clues, not proof, but if researchers use the lessons from a non-significant experiment in a new study and find that under well-specified conditions the treatment is effective for improving learning, then we’ve made a great advance.

It is important to set up experiments so that they can tell us more than “yes/no” but can instead tell us what factors did or did not contribute to positive impacts. This information is crucial whatever the overall impacts may be.

In every field that uses experiments, failures to find positive effects are common. Our task is to plan for this and learn from our own failures as well as successes. Like Linnaeus, we will make progress by learning from “brilliant errors.”

Linnaeus’ methods created the means of disconfirming his own taxonomy system. His taxonomy was indeed overthrown by later work, but his insistence on observation, categorization, and systematization, the very methods that undermined his own system of relationships among plants and animals, were his real contribution. In educational research, we must learn to celebrate high-quality rigorous research that finds what does not work, and include sufficient qualitative methods to help us learn how and why educational programs either work or do not work for children.

May we all have opportunities to fail as brilliantly as Linnaeus did!

The Maryland Challenge

As the Olympic Games earlier this summer showed, Americans love to compare ourselves with other countries. Within the U.S., we like to compare our states with other states. When Ohio State plays the University of Michigan, it’s not just a football game.

In education, we also like to compare, and we usually don’t like what we see. Comparisons can be useful in giving us a point of reference for what is possible, but a point of reference doesn’t help if it is not seen as a peer. For example, U. S. students are in the middle of the pack of developed nations on Program for International Student Assessment (PISA) tests for 15 year olds, but Americans expect to do a lot better than that. The National Assessment of Educational Progress (NAEP) allows us to compare scores within the U.S., and unless you’re in Massachusetts, which usually scores highest, you probably don’t like those comparisons either. When we don’t like our ranking, we explain it away as best as we can. Countries with higher PISA scores have fewer immigrants, or pay their teachers better, or have cultures that value education more. States that do better are richer, or have other unfair advantages. These explanations may or may not have an element of truth, but the bottom line is that comparisons on such a grand scale are just not that useful. There are far too many factors that are different between nations or states, some of which are changeable and some not, at least in the near term.

If comparisons among unequal places are not so useful, what point of reference would be better?

Kevan Collins, Director of the Education Endowment Foundation in England (England’s equivalent to our Investing in Innovation (i3) program), has an answer to this dilemma, which he explained at a recent conference I attended in Stockholm. His idea is based on a major, very successful initiative of Tony Blair’s government beginning in 2003, called the London Challenge. Secondary schools in the greater London area were put into clusters according to students’ achievement at the end of primary (elementary) school, levels of poverty, numbers of children speaking languages other than English at home, size, and other attributes. Examination of the results being achieved by schools within the same cluster showed remarkable variation in test scores. Even in the poorest clusters there were schools performing above the national average, and in the wealthiest clusters there were schools below the average. Schools low in their own clusters were given substantial resources to improve, with a particular emphasis on leadership. Over time, London went from being one of the lowest-achieving areas of England to scoring among the highest. Later versions of this plan in Manchester and in the Midlands did not work as well, but they did not have much time before the end of the Blair government meant the end of the experiment.

Fast forward to today, and think about states in the U. S. as the unit of reform. Imagine that Maryland, my state, categorized its Title I elementary, middle, and high schools according to percent free lunch, ethnic composition, percent English learners, urban/rural, school size, and so on. Each of Maryland’s Title I schools would be in a cluster of perhaps 50 very similar schools. As in England, there would be huge variation in achievement within clusters.

Just forming clusters to shame schools low in their own cluster would not be enough. The schools need help to greatly improve their outcomes.

This being 2016, we have many more proven programs than were available in the London Challenge. Schools scoring below the median of their cluster might have the opportunity to choose proven programs appropriate to their strengths and needs. The goal would be to assist every school below the median in its own cluster to at least reach the median. School staffs would have to vote by at least 80% in favor to adopt various programs. The school would also commit to use most of its federal Title I funds to match supplemental state or federal funding to pay for the programs. Schools above the median would also be encouraged to adopt proven programs, but might not receive matching funds.

Imagine what could happen. Principals and staffs could no longer argue that it is unfair for their schools to be compared to dissimilar schools. They might visit schools performing at the highest levels in their clusters, and perhaps even form coalitions across district lines to jointly select proven approaches and help each other implement them.

Not all schools would likely participate in the first years, but over time, larger numbers might join in. Because schools would be implementing programs already known to work in schools just like theirs, and would be held accountable within a fair group of peers, schools should see rapid growth toward and beyond their cluster median, and more importantly, the entire clusters should advance toward state goals.

A plan like this could make a substantial difference in performance among all Title I schools statewide. It would focus attention sharply where it is needed, on improved teaching and learning in the schools that need it most. Within a few years, Maryland, or any other state that did the same, might blow past Massachusetts, and a few years after that, we’d all be getting visits from Finnish educators!


When I was a kid, my brothers and I used to go to a YMCA camp on the Chesapeake Bay for a month every summer. My mother said that it was cheaper than feeding us, which was my first exposure to cost-effectiveness analysis.

At the camp, we did all the usual camp things. One of those was evening campfires with singing. This was a YMCA camp in the early 1960s, so we sang a lot of folk songs about peace, love, and understanding. I was reminded of this because I now have a granddaughter who loves a Peter, Paul, and Mary disk with just those songs on it, including Kumbaya.

Skip forward a few decades from those long-ago campfires. Today, the very word Kumbaya is used as an insult of sorts. It means that the person being insulted is an unrealistic idealist, who expects that social progress can be made by sitting around the campfire and singing. As a data-minded social scientist who expects evidence from randomized studies for just about everything, I should be firmly in the anti-Kumbaya camp, so to speak. But I’m not.

Let me be clear: I do not think that singing around campfires causes important social change. Yet I’d argue that a lack of Kumbaya is just as much a problem. Kumbaya-fueled idealism is the very core of evidence-based reform, in fact.

Here’s why. The greatest danger to evidence-based reform is the widespread belief that doing well-intentioned things is good enough, even if we don’t know whether they work. An idealist should never accept this. Good intentions are nice, but they do not bring about real Kumbaya. That depends on good outcomes.

Sitting around campfires and singing about peace, love, and understanding should be good preparation for actually caring whether your actions make the difference you intend them to make. Sure, life teaches you that it takes toughness to insist that good intentions become good actions, but you have to start with the good intentions.

So here is another verse to that ageless song:

Someone’s experimenting, Lord
Randomizing, Lord
Someone’s analyzing, Lord
Oh Lord,