How Classroom-Invented Innovations Can Have Broad Impacts

blog_3-8-18_blackboard_500x381When I was in high school, I had an after-school job at a small electronics company that made and sold equipment, mostly to the U.S. Navy. My job was to work with another high school student and our foreman to pack and unpack boxes, do inventories, basically whatever needed doing.

One of our regular tasks was very time-consuming. We had to test solder extractors to be sure they were working. We’d have to heat up each one for several minutes, touch a bit of solder to it, and wipe off any residue.

One day, my fellow high school student and I came up with an idea. We took 20 solder extractors and lined them up on a work table with 20 electrical outlets. We then plugged them in. By the time we’d plugged in #20, #1 was hot, so we could go back and test it, then #2, and so on. An hour-long job was reduced to 10 minutes. We were being paid the princely sum of $1.40 an hour, so we were saving the company big bucks. Our foreman immediately saw the advantages, and he told the main office about our idea.

Up in the main office, far from the warehouse, was a mean, mean man. He wore a permanent scowl. He had a car with mean, mean bumper stickers. I’ll call him Mr. Meanie.

Mr. Meanie hated everyone, but he especially hated the goofy, college-bound high school students in the warehouse. So he had to come see what we were doing, probably to prove that it was dumb idea.

Mr. Meanie came and asked me to show him the solder extractors. I laid them out, same as always, and everything worked, same as always, but due to my anxiety under Mr. Meanie’s scowl, I let one of the cords touch its neighboring solder extractor. It was ruined.

Mr. Meanie looked satisfied (probably thinking, “I knew it was a dumb idea”), and left without a word. But as long as I worked at the company, we never again tested solder extractors one at a time (and never scorched another cord). My guess is that long after we were gone, our method remained in use despite Mr. Meanie. We’d overcome him with evidence that no one could dispute.

In education, we employ some of the smartest and most capable people anywhere as teachers. Teachers innovate, and many of their innovations undoubtedly improve their own students’ outcomes. Yet because most teachers work alone, their innovations rarely spread or stick even within their own schools. When I was a special education teacher long ago, I made up and tested out many innovations for my very diverse, very disabled students. Before heading off for graduate school, I wrote them out in detail for whoever was going to receive my students the following year. Perhaps their next teachers received and paid attention to my notes, but probably not, and they could not have had much impact for very long. More broadly, there is just no mechanism for identifying and testing out teachers’ innovations and then disseminating them to others, so they have little impact beyond the teacher and perhaps his or her colleagues and student teachers, at best.

One place in the education firmament where teacher-level innovation is encouraged, noted, and routinely disseminated is in comprehensive schoolwide approaches, such as our own Success for All (SFA). Because SFA has its own definite structure and materials, promising innovations in any school or classroom may immediately apply to the roughly 1000 schools we work with across the U.S. Because SFA schools have facilitators within each school and coaches from the Success for All Foundation who regularly visit in teachers’ classes, there are many opportunities for teachers to propose innovations and show them off. Those that seem most promising may be incorporated in the national SFA program, or at least mentioned as alternatives in ongoing coaching.

As one small example, SFA constantly has students take turns reading to each other. There used to be arguments and confusion about who goes first. A teacher in Washington, DC noticed this and invented a solution. She appointed one student in each dyad to be a “peanut butter” and the other to be a “jelly.” Then she’d say, “Today, let’s start with the jellies,” and the students started right away without confusion or argument. Now, 1000 schools use this method.

A University of Michigan professor, Don Peurach, studied this very aspect of Success for All and wrote a book about it, called Seeing Complexity in Public Education (Oxford University Press, 2011). He visited dozens of SFA schools, SFA conferences, and professional development sessions, and interviewed hundreds of participants. What he described is an enterprise engaged in sharing evidence-proven practices with schools and at the same time learning from innovations and problem solutions devised in schools and communicating best practices back out to the whole network.

I’m sure that other school improvement networks do the same, because it just makes sense. If you have a school network with common values, goals, approaches, and techniques, how does it keep getting better over time if it does not learn from those who are on the front lines? I’d expect that such very diverse networks as Montessori and Waldorf schools, KIPP and Success Academy, and School Development Program and Expeditionary Learning schools, must do the same. Each of the improvements and innovations contributed by teachers or principals may not be big enough to move the needle on achievement outcomes by themselves, but collectively they keep programs moving forward as learning organizations, solving problems and improving outcomes.

In education, we have to overcome our share of Mr. Meanies trying to keep us from innovating or evaluating promising approaches. Yet we can overcome blockers and doubters if we work together to progressively improve proven programs. We can overwhelm the Mr. Meanies with evidence that no one can dispute.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Advertisements

Evidence-Based Does Not Equal Evidence-Proven

Chemist

As I speak to educational leaders about using evidence to help them improve outcomes for students, there are two words I hear all the time that give me the fantods (as Mark Twain would say):

Evidence-based

            I like the first word, “evidence,” just fine, but the second word, “based,” sort of negates the first one. The ESSA evidence standards require programs that are evidence-proven, not just evidence-based, for various purposes.

“Evidence-proven” means that a given program, practice, or policy has been put to the test. Ideally, students, teachers, or schools have been assigned at random to use the experimental program or to remain in a control group. The program is provided to the experimental group for a significant period of time, at least a semester, and then final performance on tests that are fair to both groups are compared, using appropriate statistics.

If your doctor gives you medicine, it is evidence proven. It isn’t just the same color or flavor as something proven, it isn’t just generally in line with what research suggests might be a good idea. Instead, it has been found to be effective, compared to current standards of care, in rigorous studies.

“Evidence-based,” on the other hand, is one of those wiggle words that educators love to use to indicate that they are up-to-date and know what’s expected, but don’t actually intend to do anything different from what they are doing now.

Evidence-based is today’s equivalent of “based on scientifically-based research” in No Child Left Behind. It sure sounded good, but what educational program or practice can’t be said to be “based on” some scientific principle?

In a recent Brookings article Mark Dynarski wrote about state ESSA plans, and conversations he’s heard among educators. He says that the plans are loaded with the words “evidence-based,” but with little indication of what specific proven programs they plan to implement, or how they plan to identify, disseminate, implement, and evaluate them.

I hope the ESSA evidence standards give leaders in even a few states the knowledge and the courage to insist on evidence-proven programs, especially in very low-achieving “school improvement” schools that desperately need the very best approaches. I remain optimistic that ESSA can be used to expand evidence-proven practices. But will it in fact have this impact? That remains to be proven.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

“We Don’t Do Lists”

blog218_Santa_500x332 (2)

Watching the slow, uneven, uncertain rollout of the ESSA evidence standards gives me a mixture of hope and despair. The hope stems from the fact that from coast to coast, educational leaders are actually talking about proven programs and practices at all. That was certainly rare before ESSA. But despair in that I hear many educational leaders trying to find the absolute least their states and districts can do to just barely comply with the law. The ESSA evidence standards apply in particular to schools seeking school improvement funding, which are those in the lowest 5% of their states in academic performance. A previous program with a similar name but more capital letters, School Improvement, was used under NCLB, before ESSA. A large-scale evaluation by MDRC found that the earlier School Improvement made no difference in student achievement, despite billions of dollars in investments. So you’d imagine that this time around, educators responsible for school improvement would be eager to use the new law to introduce proven programs into their lowest-achieving schools. In fact, there are individual leaders, districts, and states who have exactly this intention, and may ultimately provide good examples to the rest. But they face substantial obstacles.

One of the obstacles I hear about often is an opposition among state departments of education to disseminating lists of proven programs. I very much understand and sympathize with their reluctance, as schools have been over-regulated for a long time. However, I do not see how the ESSA evidence standards can make much of a difference if everyone makes their own list of programs. Determining which studies meet ESSA evidence standards is difficult, and requires a great deal of knowledge about research (I know this, of course, because we do such reviews ourselves; see www.evidenceforessa.org).

Some say that they want programs that have been evaluated in their own states. But after taking into account demographics (e.g., urban/rural, ELL/not ELL, etc), are state-to-state differences so great as to require different research in each? We used to work with a school located on the Ohio-Indiana border, which ran right through the building. Were there really programs that were effective on one side of the building but not on the other?

Further, state department leaders frequently complain that they have too few staff to adequately manage school improvement across their states. Should that capacity be concentrated on reviewing research to determine which programs meet ESSA evidence standards and which do not?

The irony of opposing lists for ESSA evidence standards is that most states are chock full of lists that restrict the textbooks, software, and professional development schools can select using state funds. These lists may focus on paperweight, binding, and other minimum quality issues, but they almost never have anything to do with evidence of effectiveness. One state asked us to review their textbook adoption lists for reading and math, grades K-12. Collectively, there were hundreds of books, but just a handful had even a shred of evidence of effectiveness.

Educational leaders are constantly buffeted by opposing interest groups, from politicians to school board members to leaders of unions, from PTAs presidents to university presidents, to for-profit companies promoting their own materials and programs. Educational leaders need a consistent way to ensure that the decisions they make are in the best interests of children, not the often self-serving interests of adults. The ESSA evidence standards, if used wisely, give education leaders an opportunity to say to the whole cacophony of cries for special consideration, “I’d love to help you all, but we can only approve programs for our lowest-achieving schools that are known from rigorous research to benefit our children. We say this because it is the law, but also because we believe our children, and especially our lowest achievers, deserve the most effective programs, no matter what the law says.”

To back up such a radical statement, educational leaders need clarity about what their standards are and which specific programs meet those standards. Otherwise, they either have an “anything goes’ strategy that in effect means that evidence does not matter, or they have competing vendors claiming an evidence base for their favored program. Lists of proven programs can disappoint those whose programs aren’t on the list, but they are at least clear and unambiguous, and communicate to those who want to add to the list exactly what kind of evidence they will need.

States or large districts can create lists of proven programs by starting with existing national lists (such as the What Works Clearinghouse or Evidence for ESSA) and then modifying them, perhaps by adding additional programs that meet the same standards and/or eliminating programs not available in a given location. Over time, existing or new programs can be added as new evidence appears. We, at Evidence for ESSA, are willing to review programs being considered by state or local educators for addition to their own lists, and we will do it for free and in about two weeks. Then we’ll add them to our national list if they qualify.

It is important to say that while lists are necessary, they are not sufficient. Thoughtful needs assessments, information on proven programs (such as effective methods fairs and visits to local users of proven programs), and planning for high-quality implementation of proven programs are also necessary. However, students in struggling schools cannot wait for every school, district, and state to reinvent the wheel. They need the best we can give them right now, while the field is working on even better solutions for the future.

Whether a state or district uses a national list, or starts with such a list and modifies it for its own purposes, a list of proven programs provides an excellent starting point for struggling schools. It plants a flag for all to see, one that says “Because this (state/district/school) is committed to the success of every child, we select and carefully implement programs known to work. Please join us in this enterprise.”

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

“Substantively Important” Isn’t Substantive. It Also Isn’t Important

Since it began in 2002, the What Works Clearinghouse has played an important role in finding, rating, and publicizing findings of evaluations of educational programs. It performs a crucial function for evidence-based reform. For this very reason, it needs to be right. But in several important ways, it uses procedures that are indefensible and have a big impact on its conclusions.

One of these relates to a study rating called “substantively important-positive.” This refers to study outcomes with an effect size of at least +0.25, but that are not statistically significant. I’ve written about this before, but the WWC has recently released a database of information on its studies that makes it easy to analyze WWC data on a large scale, and we have learned a lot more about this topic.

Study outcomes rated as “substantively important – positive” can qualify a study as “potentially positive,” the second-highest WWC rating. “Substantively important-negative” findings (non-significant effect sizes less than -0.25) can cause a study to be rated as potentially negative, which can keep a study from getting a positive rating forever, as a single “potentially negative” rating, under current rules, ensures that a program can never receive a rating better than “mixed,” even if other studies found hundreds of significant positive effects.

People who follow the WWC and know about “substantively important” may assume that it may be a strange rule, but relatively rare in practice. But that is not true.

My graduate student, Amanda Inns, has just done an analysis of WWC data from their own database, and if you are a big fan of the WWC, this is going to be a shock. Amanda has looked at all WWC-accepted reading and math studies. Among these, she found a total of 339 individual outcomes rated “positive” or “potentially positive.” Of these, 155 (46%) reached the “potentially positive” level only because they had effect sizes over +0.25, but were not statistically significant.

Another 36 outcomes were rated “negative” or “potentially negative.” 26 of these (72%) were categorized as “potentially negative” only because they had effect sizes less than -0.25 and were not significant. I’m sure patterns would be similar for subjects other than reading and math.

Put another way, almost half (48%) of outcomes rated positive/potentially positive or negative/potentially negative by the WWC were not statistically significant. As one example of what I’m talking about, consider a program called The Expert Mathematician. It had just one study with only 70 students in 4 classrooms (2 experimental and 2 control). The WWC re-analyzed the data to account for clustering, and the outcomes were nowhere near statistically significant, though they were greater than +0.25. This tiny study, and this study alone, caused The Expert Mathematician to receive the WWC “potentially positive” rating and to be ranked seventh among all middle school math programs. Similarly, Waterford Early Learning received a “potentially positive” rating based on a single tiny study with only 70 kindergarteners in 6 schools. The outcomes ranged from -0.71 to +1.11, and though the mean was more than +0.25, the outcome was far from significant. Yet this study alone put Waterford on the WWC list of proven kindergarten programs.

I’m not taking any position on whether these particular programs are in fact effective. All I am saying is that these very small studies with non-significant outcomes say absolutely nothing of value about that question.

I’m sure that some of you nerdier readers who have followed me this far are saying to yourselves, “well, sure, these substantively important studies may not be statistically significant, but they are probably unbiased estimates of the true effect.”

More bad news. They are not. Not even close.

The problem, also revealed in Amanda Inns’ data, is that studies with large effect sizes but not statistical significance tend to have very small sample sizes (otherwise, they would have been significant). Across WWC reading and math studies that used individual-level assignment, median sample sizes were 48, 74, or 86, for substantively important, significant, or indeterminate (non-significant with ES < +0.25), respectively. For cluster studies, they were 10, 17, and 33 clusters respectively. In other words, “substantively important” outcomes averaged less than half the sample sizes of other outcomes.

And small-sample studies greatly overstate effect sizes. Among all factors that bias effect sizes, small sample size is the most important (only use of researcher/developer-made measures comes close). So a non-significant positive finding in a small study is not an unbiased point estimate that just needs a larger sample to show its significance. It is probably biased, in a consistent, positive direction. Studies with sample sizes less than 100 have about three times the mean effect sizes of studies with sample sizes over 1000, for example.

But “substantively important” ratings can throw a monkey wrench into current policy. The ESSA evidence standards require statistically significant effects for all of its top three levels (strong, moderate, and promising). Yet many educational leaders are using the What Works Clearinghouse as a guide to which programs will meet ESSA evidence standards. They may logically assume that if the WWC says a program is effective, then the federal government stands behind it, regardless of what the ESSA evidence standards actually say. Yet in fact, based on the data analyzed by Amanda Inns for reading and math, 46% of the outcomes rated as positive/potentially positive by WWC (taken to correspond to “strong” or “moderate,” respectively, under ESSA evidence standards) are non-significant, and therefore do not qualify under ESSA.

The WWC needs to remove “substantively important” from its ratings as soon as possible, to avoid a collision with ESSA evidence standards, and to avoid misleading educators any further. Doing so would help make the WWC’s impact on ESSA substantive. And important.

 

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Publishers and Evidence

High above the Avenue of the Americas on prime real estate in midtown Manhattan towers the 51-story McGraw-Hill building. When in New York, I always find time to go look at that building and reflect on the quixotic quest I and my colleagues are on to get educational decision makers to choose rigorously evaluated, proven programs. These programs are often made by small non-profit organizations and universities, like the ones I work in. Looking up at that mighty building in New York, I always wonder, are we fooling ourselves? Who are we to take on some of the most powerful companies in the world?

Education publishing is dominated by three giant, multi-billion dollar publishers, plus another three or four even bigger technology companies. These behemoths are not worried about us, not one bit. Instead, they are worried about each other.

From my experience, there are very good people who work in publishing and technology companies, people who genuinely hope that their products will improve learning for students. They would love to create innovative programs, evaluate them rigorously, and disseminate those found to be effective. However, big as they are, the major publishers face severe constraints in offering proven programs. Because they are in ferocious competition with each other, publishers cannot easily invest in expensive development and evaluation, or insist on extensive professional development, a crucial element of virtually all programs that have been shown to improve student achievement. Doing so would raise their costs, making them vulnerable to lower-cost competitors.

In recent years, many big publishers and technology companies have begun to commission third-party evaluations of their major textbooks, software, and other products. If the evaluations show positive outcomes, they can use this information in their marketing, and having rigorous evidence showing positive impacts helps protect them from the possibility that government might begin to favor programs, software, or other products with proven outcomes in rigorous research. This is exactly what did happen with the enactment of the ESSA evidence standards, though the impact of these standards has not yet been strongly felt.

However, publishers and technology companies cannot get too far out ahead of their market. If superintendents, central office leaders, and others who select textbooks and technology get on board the evidence train, then publishers will greatly expand their efforts in research and development. If the market continues to place little value on evidence, so will the big publishers.

In contrast to commercial publishers and technology companies, non-profit organizations play a disproportionate role in the evidence movement. They are often funded by government or philanthropies to create and evaluate innovations, as big commercial companies almost never are. Non-profits have the freedom to experiment, and to disseminate what works. However, non-profits, universities, and tiny for-profit start-ups are small, under-capitalized, and have little capacity or experience in marketing. Their main, and perhaps only, competitive advantage is that they have evidence of effectiveness. If no one cares about evidence, our programs will not last long.

One problem publishers face is that evaluations of traditional textbooks usually do not show any achievement benefits compared to control groups. The reason is that one publisher’s textbook is just not that different from another’s, which is what the control group is using. Publishers rarely provide much professional development, which makes it difficult for them to introduce anything truly innovative. The half-day August in-service that comes with most textbooks is barely enough to get teachers familiar with the features in the most traditional book. The same is true of technology approaches, which also rarely make much difference in student outcomes, perhaps because they typically provide little professional development beyond what is necessary to run the software.

The strategy emphasized by government and philanthropy for many years has been to fund innovators to create and evaluate programs. Those that succeed are then encouraged or funded to “scale up” their proven programs. Some are able to grow to impressive scale, but never so much as to worry big companies. An occasional David can surprise an occasional Goliath, but in the long run, the big guys win, and they’ll keep winning until someone changes the rules. To oversimplify a bit, what we have are massive publishers and technology companies with few proven innovations, and small non-profits with proven programs but little money or marketing expertise. This is not a recipe for progress.

The solution lays with government. National, state, and/or local governments have to adopt policies that favor the use of programs and software that have been proven in rigorous experiments to be effective in improving student achievement. At the federal level, the ESSA evidence standards are showing the way, and if they truly catch hold, this may be enough. But imagine if a few large states or even big districts started announcing that they were henceforth going to require evidence of effectiveness when they adopt programs and software. The effect could be electric.

For non-profits, such policies could greatly expand access to schools, and perhaps to funding. But most non-profits are so small that it would take them years to scale up substantially while maintaining quality and effectiveness.

For publishers and technology companies, the effect could be even more dramatic. If effectiveness begins to matter, even if just in a few key places, then it becomes worthwhile for them to create, partner with, or acquire effective innovations that provide sufficient professional development. In states and districts with pro-evidence policies, publishers would not have to worry about being undercut by competitors, because all vendors would have to meet evidence standards.

Publishers have tried to acquire proven programs in the past, but this usually comes to smash, because they tend to strip out the professional development and other elements that made the program work in the first place. However, in a pro-evidence environment, publishers would be motivated to maintain the quality, effectiveness, and “brand” of any programs they acquire.

In medicine, most research on practical medications is funded by drug companies and carefully monitored and certified by government. Could such a thing happen in education?

Publishers and technology companies have the capital and expertise to take effective programs to scale. Partnering with creators of proven programs, or creating and evaluating their own, big companies can make a real difference, as long as government ensures that the programs they are disseminating are in fact of the same quality and effectiveness as the versions that were found to be effective.

Publishers and technology companies are a key part of the education landscape. They need to be welcomed into evidence-based reform, and incentivized to engage in innovation and evaluation. Otherwise, educational innovation will remain a marginal activity, benefitting thousands of students when millions are in need.

This blog is sponsored by the Laura and John Arnold Foundation

First, Do No Harm: The Blind Duchess

One of the great strengths of the evidence movement in education has been its bipartisan nature. Democrats and Republicans, liberals and conservatives have equal reasons to want to know what works, and to try to ensure that government funds will be spent primarily on programs and practices known to work from rigorous experiments. Politics plays a legitimate role in determining how evidence is put to use and what values should underpin policies in education, but whatever one’s politics, everyone should agree that it’s essential to know what works.

Yet while it’s easy to conclude that we should promote what does work, it’s not so easy to decide what to do in areas in which there is insufficient evidence. We want to gradually replace programs and practices not known to work with those that do have strong evidence, but what do we do while the evidence base is growing?

I recently took a tour of Chatsworth, a huge, ornate great house that since the 1600s has been the family seat of the Dukes of Devonshire, one of the wealthiest families in England. Our guide told us about a famous duchess, Georgiana (a distant ancestor of Lady Diana). In the late 1700s, Georgiana suffered from irritated eyes. Her physician had her bathe her eyes in a mixture of milk and vinegar, and then applied leeches. As a consequence, she went blind.

The duchess’ physician ignored the first principle of medicine, stated in the Hippocratic Oath that every doctor swears: “First, do no harm.” I think it is safe to assume that the Duchess of Devonshire could have had any doctor in Europe, and that the one she chose was considered one of the best. Yet even a duke or duchess or a king or queen could not obtain the kind of routine medical care we take for granted today. But what their doctors could at least do was to take care to avoid making things worse. Recall that around the same time, King George III suffered from insanity, perhaps caused by his physicians, and George Washington was killed by his leech-using doctors.

Today, in education, we face a different set of problems, but we must start with the Hippocratic principle: First, do no harm. But for us, doing no harm is less than straightforward.

In educational practice, we have a growing but still modest number of proven interventions. As I’ve noted previously, our Evidence for ESSA website contains approximately 100 reading and math programs for grades K-12 that meet current ESSA evidence standards. That’s impressive, but it is still a smaller number of proven programs than we’d like, especially in secondary schools and in mathematics. We are now working on the category of science, which has fewer proven programs, and we know that writing will have fewer still.

In all of education research, there are very few programs known to do actual harm, so we don’t really have to worry too much about the Duchess of Devonshire’s problem. What we have instead is a growing number of proven and promising programs and a very large number of programs that have not been evaluated at all, or not well enough to meet current standards, or with mixed outcomes.

For educators, “First, do no harm” may be taken to mean, “use programs proven to be effective when they exist, but stick with promising approaches until better ones have been validated.” That is, in areas in which there are many programs with strong, positive evidence of effectiveness, select one of these and implement it with care. But in areas in which few programs exist, use the best available, rather than insisting on perfect evidence.

One example of what I am talking about is after-school programs. Under federal funding called 21st Century Community Learning Centers (21st CCLC), after-school programs have been widespread. Several years ago, an evaluation of 21st CCLC found few benefits for student achievement, and there are few if any proven models in broad scale use. So how should the federal government respond?

I would argue that the principle of “First, do no harm” would support continuing but significantly modifying 21st CCLC or other after-school funding. Federal support for after school programs might be reformed to focus on development and evaluation of programs that improve achievement outcomes. In this way, federal dollars continue to support a popular and perhaps useful service, but more importantly they support R&D to find out which forms of that service produce the desired outcomes. The same approach might be applied to career and technical education and many other areas in which there is substantial federal, state, or local investment, but little evidence of what works. In each case, funds currently supporting popular but unproven services could be shifted to supporting development, evaluation, and dissemination of proven, effective strategies designed to meet the activity’s goal.

Instead of potentially harming students or taking away funding altogether, such a strategy could open up new areas of inquiry that would be sure to eventually create and validate effective programs where they do not exist today.

In education, “First, do no harm” should not justify abandonment of whole areas of education services that lack a sufficient selection of proven approaches. Instead, it means supplementing service dollars with R&D dollars to find out what works. We cannot justify the kinds of treatment the Duchess of Devonshire received for her irritated eye, but we also cannot justify using her case to give up on the search for effective treatments.

This blog is sponsored by the Laura and John Arnold Foundation

The Age of Evidence

In 1909, most people outside of cities had never seen an automobile. Those that existed frequently broke down, and there were few mechanics. Roads were poor, fuel was difficult to obtain, and spare parts were scarce. The automobile industry had not agreed on the best form of propulsion, so steam-powered cars, electric cars, and diesel cars shared the road with gasoline-powered cars. The high cost of cars made them a rich man’s hobby and a curiosity rather than a practical necessity for most people.

Yet despite all of these limitations, anyone with eyes to see knew that the automobile was the future.

I believe that evidence in education is at a similar point in its development. There are still not enough proven programs in all fields and grade levels. Educators are just now beginning to understand what proven programs can do for their children. Old fashioned textbooks and software lacking a scintilla of evidence still dominate the market. Many schools that do adopt proven programs may still not get promised outcomes because they shortchange professional development, planning, or other resources.

Despite all of these problems, any educator or policy maker with eyes to see knows that evidence is the future.

There are many indicators that the Age of Evidence is upon us. Here are some I’d point to.

· The ESSA evidence standards. The definitions in the ESSA law of strong, moderate, and promising levels of evidence and incentives to use programs that meet them are not yet affecting practice on a large scale, but they are certainly leading to substantial discussion about evidence among state, district, and school leaders. In the long run, this discussion may be as important as the law itself in promoting the use of evidence.

· The availability of many more proven programs. Our Evidence for ESSA website found approximately 100 K-12 reading and math programs meeting one of the top three ESSA standards. Many more are in the pipeline.

· Political support for evidence is growing and non-partisan. Note that the ESSA standards were passed with bipartisan support in a Republican Congress. This is a good indication that evidence is becoming a consensus “good government” theme, not just something that professors do.

· We’ve tried everything else. Despite their commendable support for research, both the G.W. Bush and the Obama administrations mainly focused on policies that ignored the existence of proven programs. Progress in student performance was disappointing. Perhaps next time, we’ll try using what works.

Any of these indicators could experience setbacks or reversals, but in all of modern history, it’s hard to think of cases in which, once the evidence/innovation genie is out of the bottle, it is forced back inside. Progress toward the Age of Evidence may be slower or more uneven than we’d like, but this is an idea that once planted tends to persist, and to change institutions.

If we have proven, better ways to teach reading or math or science, to increase graduation rates and college and career readiness, or to build students’ social and emotional skills and improve classroom behavior, then sooner or later policy and practice must take this evidence into account. When it does, it will kick off a virtuous cycle in which a taste for evidence among education leaders leads to substantial investments in R&D by government and the private sector. This will lead to creation and successful evaluation of better and better educational programs, which will progressively add to the taste for evidence, feeding the whole cycle.

The German philosopher Schopenhauer once said that every new idea is first ridiculed, then vehemently opposed, and then accepted as self-evident. I think we are nearing a turning point, where resistance to the idea of evidence of effectiveness as a driver in education is beginning to give way to a sense that of course any school should be using proven programs. Who would argue otherwise?

Other fields, such as medicine, agriculture, and technology, including automotive technology, long ago reached a point of no return, when innovation and evidence of effectiveness began to expand rapidly. Because education is mostly a creature of government, it has been slower to change, but change is coming. And when this point of no return arrives, we’ll never look back. As new teaching approaches, new uses of technology, new strategies for engaging students with each other, new ways of simulating scientific, mathematical, and social processes, and new ways of accommodating student differences are created, successfully evaluated, and disseminated, education will become an exciting, constantly evolving field. And no one will even remember a time when this was not the case.

In 1909, the problems of automotive engineering were daunting, but there was only one way things were going to go. True progress has no reverse gear. So it will be in education, as our Age of Evidence dawns.

This blog is sponsored by the Laura and John Arnold Foundation