Evidence, Standards, and Chicken Feathers

In 1509, John Damian, an alchemist in the court of James IV of Scotland proclaimed that he had developed a way for humans to fly. He made himself some wings from chicken feathers and jumped from the battlements of Stirling Castle, the Scottish royal residence at the time. His flight was brief but not fatal.  He landed in a pile of manure, and only broke his thigh.  Afterward, he explained that the problem was that he used the wrong kind of feathers.  If only he had used eagle feathers, he could have flown, he asserted.  Fortunately for him, he never tried flying again, with any kind of feathers.

blog_11-15-18_humanornithopter_500x314

The story of John Damian’s downfall is humorous, and in fact the only record of it is a contemporary poem making fun of it. Yet there are important analogies to educational policy today from this incident in Scottish history. These are as follows:

  1. Damian proclaimed the success of his plan for human flight before he or anyone else had tried it and found it effective.
  2. After his flight ended in the manure pile, he proclaimed (again without evidence) that if only he’d used eagle feathers, he would have succeeded. This makes sense, of course, because eagles are much better flyers than chickens.
  3. He was careful never to actually try flying with eagle feathers.

All of this is more or less what we do all the time in educational policy, with one big exception.  In education, based on Damian’s experience, we might have put forward policies stating that from now on human powered flight must only be done with eagle feathers, not chicken feathers.

What I am referring to in education is our obsession with standards as a basis for selecting textbooks, software, and professional development, and the relative lack of interest in evidence. Whole states and districts spend a lot of time devising standards and then reviewing materials and services to be sure that they align with these standards. In contrast, the idea of checking to see that texts, software, and PD have actually been evaluated and found to be effective in real classrooms with real teachers and students has been a hard slog.

Shouldn’t textbooks and programs that meet modern standards also produce higher student performance on tests closely aligned with those standards? This cannot be assumed. Not long ago, my colleagues and I examined every reading and math program rated “meets expectations” (the highest level) on EdReports, a website that rates programs in terms of their alignment with college- and career-ready standards.  A not so grand total of two programs had any evidence of effectiveness on any measure not made by the publishers. Most programs rated “meets expectations” had no evidence at all, and a smaller number had been evaluated and found to make no difference.

I am not in any way criticizing EdReports.  They perform a very valuable service in helping schools and districts know which programs meet current standards. It makes no sense for every state and district to do this for themselves, especially in the cases where there are very few or no proven programs. It is useful to at least know about programs aligned with standards.

There is a reason that so few products favorably reviewed on EdReports have any positive outcomes in rigorous research. Most are textbooks, and very few textbooks have evidence of effectiveness. Why? The fact is that standards or no standards, EdReports or no EdReports, textbooks do not differ very much from each other in aspects that matter for student learning. Textbooks differ (somewhat) in content, but if there is anything we have learned from our many reviews of research on what works in education, what matters is pedagogy, not content. Yet since decisions about textbooks and software depend on standards and content, decision makers almost invariably select textbooks and software that have never been successfully evaluated.

Even crazy John Damian did better than we do. Yes, he claimed success in flying before actually trying it, but at last he did try it. He concluded that his flying plan would have worked if he’d used eagle feathers, but he never imposed this untested standard on anyone.

Untested textbooks and software probably don’t hurt anyone, but millions of students desperately need higher achievement, and focusing resources on untested or ineffective textbooks, software, and PD does not move them forward. The goal of education is to help all students succeed, not to see that they use aligned materials. If a program has been proven to improve learning, isn’t that a lot more important than proving that it aligns with standards? Ideally, we’d want schools and districts to use programs that are both proven effective and aligned with standards, but if no programs meet both criteria, shouldn’t those that are proven effective be preferred? Without evidence, aren’t we just giving students and teachers eagle feathers and asking them to take a leap of faith?

Photo credit: Humorous portrayal of a man who flies with wings attached to his tunic, Unknown author [Public domain], via Wikimedia Commons/Library of Congress

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

 

Advertisements

Miss Evers’ Boys (And Girls)

Most people who have ever been involved with human subjects’ rights know about the Tuskegee Syphilis Study. This was a study of untreated syphilis, in which 622 poor, African American sharecroppers, some with syphilis and some without, were evaluated over 40 years.

The study, funded and overseen by the U.S. Public Health Service, started in 1932. In 1940, researchers elsewhere discovered that penicillin cured syphilis. By 1947, penicillin was “standard of care” for syphilis, meaning that patients with syphilis received penicillin as a matter of course, anywhere in the U.S.

But not in Tuskegee. Not in 1940. Not in 1947. Not until 1972, when a whistle-blower made the press aware of what was happening. In the meantime, many of the men died of syphilis, 40 of their wives contracted the disease, and 19 of their children were born with congenital syphilis. The men had never even been told the nature of the study, they were not informed in 1940 or 1947 that there was now a cure, and they were not offered that cure. Leaders of the U.S. Public Health Service were well aware that there was a cure for syphilis, but for various reasons, they did not stop the study. Not in 1940, not in 1947, not even when whistle-blowers told them what was going on. They stopped it only when the press found out.

blog_11-1-18_tuskegee_500x363

In 1997 a movie on the Tuskegee Syphilis Study was released. It was called Miss Evers’ Boys. Miss Evers (actually, Eunice Rivers) was the African-American public health nurse who was the main point of contact for the men over the whole 40 years. She deeply believed that she, and the study, were doing good for the men and their community, and she formed close relationships with them. She believed in the USPHS leadership, and thought they would never harm her “boys.”

The Tuskegee study was such a crime and scandal that it utterly changed procedures for medical research in the U.S. and most of the world. Today, participants in research with any level of risk, or their parents if they are children, must give informed consent for participation in research, and even if they are in a control group, they must receive at least “standard of care”: currently accepted, evidence-based practices.

If you’ve read my blogs, you’ll know where I’m going with this. Failure to use proven educational treatments, unlike medical ones, is rarely fatal, at least not in the short term. But otherwise, our profession carries out Tuskegee crimes all the time. It condemns failing students to ineffective programs and practices when effective ones are known. It fails to even inform parents or children, much less teachers and principals, that proven programs exist: Proven, practical, replicable solutions for the problems they face every day.

Like Miss Rivers, front-line educators care deeply about their charges. Most work very hard and give their absolute best to help all of their children to succeed. Teaching is too much hard work and too little money for anyone to do it for any reason but for the love of children.

But somewhere up the line, where the big decisions are made, where the people are who know or who should know which programs and practices are proven to work and which are not, this information just does not matter. There are exceptions, real heroes, but in general, educational leaders who believe that schools should use proven programs have to fight hard for this position. The problem is that the vast majority of educational expenditures—textbooks, software, professional development, and so on—lack even a shred of evidence. Not a scintilla. Some have evidence that they do not work. Yet advocates for those expenditures (such as sales reps and educators who like the programs) argue strenuously for programs with no evidence, and it’s just easier to go along. Whole states frequently adopt or require textbooks, software, and services of no known value in terms of improving student achievement. The ESSA evidence standards were intended to focus educators on evidence and incentivize use of proven programs, at least for the lowest-achieving 5% of schools in each state, but so far it’s been slow going.

Yet there are proven alternatives. Evidence for ESSA (www.evidenceforessa.org) lists more than 100 PK-12 reading and math programs that meet the top three ESSA evidence standards. The majority meet the top level, “Strong.” And most of the programs were researched with struggling students. Yet I am not perceiving a rush to find out about proven programs. I am hearing a lot of new interest in evidence, but my suspicion, growing every day, is that many educational leaders do not really care about the evidence, but are instead just trying to find a way to keep using the programs and providers they already have and already like, and are looking for evidence to justify keeping things as they are.

Every school has some number of struggling students. If these children are provided with the same approaches that have not worked with them or with millions like them, it is highly likely that most will fail, with all the consequences that flow from school failure: Retention. Assignment to special education. Frustration. Low expectations. Dropout. Limited futures. Poverty. Unemployment. There are 50 million children in grades PK to 12 in the U.S. This is the grinding reality for perhaps 10 to 20 million of them. Solutions are readily available, but not known or used by caring and skilled front-line educators.

In what way is this situation unlike Tuskegee in 1940?

 Photo credit: By National Archives Atlanta, GA (U.S. government) ([1], originally from National Archives) [Public domain], via Wikimedia Commons

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Succeeding Faster in Education

“If you want to increase your success rate, double your failure rate.” So said Thomas Watson, the founder of IBM. What he meant, of course, is that people and organizations thrive when they try many experiments, even though most experiments fail. Failing twice as often means trying twice as many experiments, leading to twice as many failures—but also, he was saying, many more successes.

blog_9-20-18_TJWatson_500x488
Thomas Watson

In education research and innovation circles, many people know this quote, and use it to console colleagues who have done an experiment that did not produce significant positive outcomes. A lot of consolation is necessary, because most high-quality experiments in education do not produce significant positive outcomes. In studies funded by the Institute for Education Sciences (IES), Investing in Innovation (i3), and England’s Education Endowment Foundation (EEF), all of which require very high standards of evidence, fewer than 20% of experiments show significant positive outcomes.

The high rate of failure in educational experiments is often shocking to non-researchers, especially the government agencies, foundations, publishers, and software developers who commission the studies. I was at a conference recently in which a Peruvian researcher presented the devastating results of an experiment in which high-poverty, mostly rural schools in Peru were randomly assigned to receive computers for all of their students, or to continue with usual instruction. The Peruvian Ministry of Education was so confident that the computers would be effective that they had built a huge model of the specific computers used in the experiment and attached it to the Ministry headquarters. When the results showed no positive outcomes (except for the ability to operate computers), the Ministry quietly removed the computer statue from the top of their building.

Improving Success Rates

Much as I believe Watson’s admonition (“fail more”), there is another principle that he was implying, or so I expect: We have to learn from failure, so we can increase the rate of success. It is not realistic to expect government to continue to invest substantial funding in high-quality educational experiments if the success rate remains below 20%. We have to get smarter, so we can succeed more often. Fortunately, qualitative measures, such as observations, interviews, and questionnaires, are becoming required elements of funded research, facilitating finding out what happened so that researchers can find out what went wrong. Was the experimental program faithfully implemented? Were there unexpected responses toward the program by teachers or students?

In the course of my work reviewing positive and disappointing outcomes of educational innovations, I’ve noticed some patterns that often predict that a given program is likely or unlikely to be effective in a well-designed evaluation. Some of these are as follows.

  1. Small changes lead to small (or zero) impacts. In every subject and grade level, researchers have evaluated new textbooks, in comparison to existing texts. These almost never show positive effects. The reason is that textbooks are just not that different from each other. Approaches that do show positive effects are usually markedly different from ordinary practices or texts.
  2. Successful programs almost always provide a lot of professional development. The programs that have significant positive effects on learning are ones that markedly improve pedagogy. Changing teachers’ daily instructional practices usually requires initial training followed by on-site coaching by well-trained and capable coaches. Lots of PD does not guarantee success, but minimal PD virtually guarantees failure. Sufficient professional development can be expensive, but education itself is expensive, and adding a modest amount to per-pupil cost for professional development and other requirements of effective implementation is often the best way to substantially enhance outcomes.
  3. Effective programs are usually well-specified, with clear procedures and materials. Rarely do programs work if they are unclear about what teachers are expected to do, and helped to do it. In the Peruvian study of one-to-one computers, for example, students were given tablet computers at a per-pupil cost of $438. Teachers were expected to figure out how best to use them. In fact, a qualitative study found that the computers were considered so valuable that many teachers locked them up except for specific times when they were to be used. They lacked specific instructional software or professional development to create the needed software. No wonder “it” didn’t work. Other than the physical computers, there was no “it.”
  4. Technology is not magic. Technology can create opportunities for improvement, but there is little understanding of how to use technology to greatest effect. My colleagues and I have done reviews of research on effects of modern technology on learning. We found near-zero effects of a variety of elementary and secondary reading software (Inns et al., 2018; Baye et al., in press), with a mean effect size of +0.05 in elementary reading and +0.00 in secondary. In math, effects were slightly more positive (ES=+0.09), but still quite small, on average (Pellegrini et al., 2018). Some technology approaches had more promise than others, but it is time that we learned from disappointing as well as promising applications. The widespread belief that technology is the future must eventually be right, but at present we have little reason to believe that technology is transformative, and we don’t know which form of technology is most likely to be transformative.
  5. Tutoring is the most solid approach we have. Reviews of elementary reading for struggling readers (Inns et al., 2018) and secondary struggling readers (Baye et al., in press), as well as elementary math (Pellegrini et al., 2018), find outcomes for various forms of tutoring that are far beyond effects seen for any other type of treatment. Everyone knows this, but thinking about tutoring falls into two camps. One, typified by advocates of Reading Recovery, takes the view that tutoring is so effective for struggling first graders that it should be used no matter what the cost. The other, also perhaps thinking about Reading Recovery, rejects this approach because of its cost. Yet recent research on tutoring methods is finding strategies that are cost-effective and feasible. First, studies in both reading (Inns et al., 2018) and math (Pellegrini et al., 2018) find no difference in outcomes between certified teachers and paraprofessionals using structured one-to-one or one-to-small group tutoring models. Second, although one-to-one tutoring is more effective than one-to-small group, one-to-small group is far more cost-effective, as one trained tutor can work with 4 to 6 students at a time. Also, recent studies have found that tutoring can be just as effective in the upper elementary and middle grades as in first grade, so this strategy may have broader applicability than it has in the past. The real challenge for research on tutoring is to develop and evaluate models that increase cost-effectiveness of this clearly effective family of approaches.

The extraordinary advances in the quality and quantity of research in education, led by investments from IES, i3, and the EEF, have raised expectations for research-based reform. However, the modest percentage of recent studies meeting current rigorous standards of evidence has caused disappointment in some quarters. Instead, all findings, whether immediately successful or not, should be seen as crucial information. Some studies identify programs ready for prime time right now, but the whole body of work can and must inform us about areas worthy of expanded investment, as well as areas in need of serious rethinking and redevelopment. The evidence movement, in the form it exists today, is completing its first decade. It’s still early days. There is much more we can learn and do to develop, evaluate, and disseminate effective strategies, especially for students in great need of proven approaches.

References

Baye, A., Lake, C., Inns, A., & Slavin, R. (in press). Effective reading programs for secondary students. Reading Research Quarterly.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2018). Effective programs for struggling readers: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

Pellegrini, M., Inns, A., & Slavin, R. (2018). Effective programs in elementary mathematics: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

 Photo credit: IBM [CC BY-SA 3.0  (https://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

 

The Good, the Bad, and the (Un)Promising

The ESSA evidence standards are finally beginning to matter. States are starting the process that will lead them to make school improvement awards to their lowest-achieving schools. The ESSA law is clear that for schools to qualify for these awards, they must agree to implement programs that meet the strong, moderate, or promising levels of the ESSA evidence standards. This is very exciting for those who believe in the power of proven programs to transform schools and benefit children. It is good news for kids, for teachers, and for our profession.

But inevitably, there is bad news with the good. If evidence is to be a standard for government funding, there are bound to be people who disseminate programs lacking high-quality evidence who will seek to bend the definitions to declare themselves “proven.” And there are also bound to be schools and districts that want to keep using what they have always used, or to keep choosing programs based on factors other than evidence, while doing the minimum the law requires.

The battleground is the ESSA “promising” criterion. “Strong” programs are pretty well defined as having significant positive evidence from high-quality randomized studies. “Moderate” programs are pretty well defined as having significant positive evidence from high-quality matched studies. Both “strong” and “moderate” are clearly defined in Evidence for ESSA (www.evidenceforessa.org), and, with a bit of translation, by the What Works Clearinghouse, both of which list specific programs that meet or do not meet these standards.

“Promising,” on the other hand is kind  of . . . squishy. The ESSA evidence standards do define programs meeting “promising” as ones that have statistically significant effects in “well-designed and well-implemented” correlational studies, with controls for inputs (e.g., pretests).  This sounds good, but it is hard to nail down in practice. I’m seeing and hearing about a category of studies that perfectly illustrate the problem. Imagine that a developer commissions a study of a form of software. A set of schools and their 1000 students are assigned to use the software, while control schools and their 1000 students do not have access to the software but continue with business as usual.

Computers routinely produce “trace data” that automatically tells researchers all sorts of things about how much students used the software, what they did with it, how successful they were, and so on.

The problem is that typically, large numbers of students given software do not use it. They may never even hit a key, or they may use the software so little that the researchers rule the software use to be effectively zero. So in a not unusual situation, let’s assume that in the treatment group, the one that got the software, only 500 of the 1000 students actually used the software at an adequate level.

Now here’s the rub. Almost always, the 500 students will out-perform the 1000 controls, even after controlling for pretests. Yet this would be likely to happen even if the software were completely ineffective.

To understand this, think about the 500 students who did use the software and the 500 who did not. The users are probably more conscientious, hard-working, and well-organized. The 500 non-users are more likely to be absent a lot, to fool around in class, to use their technology to play computer games, or go on (non-school-related) social media, rather than to do math or science for example. Even if the pretest scores in the user and non-user groups were identical, they are not identical students, because their behavior with the software is not equal.

I once visited a secondary school in England that was a specially-funded model for universal use of technology. Along with colleagues, I went into several classes. The teachers were teaching their hearts out, making constant use of the technology that all students had on their desks. The students were well-behaved, but just a few dominated the discussion. Maybe the others were just a bit shy, we thought. From the front of each class, this looked like the classroom of the future.

But then, we filed to the back of each class, where we could see over students’ shoulders. And we immediately saw what was going on. Maybe 60 or 70 percent of the students were actually on social media unrelated to the content, paying no attention to the teacher or instructional software!

blog_5-24-18_DistStudents_500x332

Now imagine that a study compared the 30-40% of students who were actually using the computers to students with similar pretests in other schools who had no computers at all. Again, the users would look terrific, but this is not a fair comparison, because all the goof-offs and laggards in the computer school had selected themselves out of the study while goof-offs and laggards in the control group were still included.

Rigorous researchers use a method called intent-to-treat, which in this case would include every student, whether or not they used the software or played non-educational computer games. “Not fair!” responds the software developer, because intent-to-treat includes a lot of students who never touched a key except to use social media. No sophisticated researcher accepts such an argument, however, because including only users gives the experimental group a big advantage.

Here’s what is happening at the policy level. Software developers are using data from studies that only include the students who made adequate use of the software. They are then claiming that such studies are correlational and meet the “promising” standard of ESSA.

Those who make this argument are correct in saying that such studies are correlational. But these studies are very, very, very bad, because they are biased toward the treatment. The ESSA standards specify well-designed and well-implemented studies, and these studies may be correlational, but they are not well-designed or well-implemented. Software developers and other vendors are very concerned about the ESSA evidence standards, and some may use the “promising” category as a loophole. Evidence for ESSA does not accept such studies, even as promising, and the What Works Clearinghouse does not even have any category that corresponds to “promising.” Yet vendors are flooding state departments of education and districts with studies they claim to meet the ESSA standards, though in the lowest category.

Recently, I heard something that could be a solution to this problem. Apparently, some states are announcing that for school improvement grants, and any other purpose that has financial consequences, they will only accept programs with “strong” and “moderate” evidence. They have the right to do this; the federal law says school improvement grants must support programs that at least meet the “promising” standard, but it does not say states cannot set a higher minimum standard.

One might argue that ignoring “promising” studies is going too far. In Evidence for ESSA (www.evidenceforessa.org), we accept studies as “promising” if they have weaknesses that do not lead to bias, such as clustered studies that were significant at the student but not the cluster level. But the danger posed by studies claiming to fit “promising” using biased designs is too great. Until the feds fix the definition of “promising” to exclude bias, the states may have to solve it for themselves.

I hope there will be further development of the “promising” standard to focus it on lower-quality but unbiased evidence, but as things are now, perhaps it is best for states themselves to declare that “promising” is no longer promising.

Eventually, evidence will prevail in education, as it has in many other fields, but on the way to that glorious future, we are going to have to make some adjustments. Requiring that “promising” be truly promising would be a good place to begin.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

 

Nevada Places Its Bets on Evidence

blog_3-29-18_HooverDam_500x375In Nevada, known as the land of big bets, taking risks is what they do. The Nevada State Department of Education (NDE) is showing this in its approach to ESSA evidence standards .  Of course, many states are planning policies to encourage use of programs that meet the ESSA evidence standards, but to my knowledge, no state department of education has taken as proactive a stance in this direction as Nevada.

 

Under the leadership of their state superintendent, Steve Canavero, Deputy Superintendent Brett Barley, and Director of the Office of Student and School Supports Seng-Dao Keo, Nevada has taken a strong stand: Evidence is essential for our schools, they maintain, because our kids deserve the best programs we can give them.

All states are asked by ESSA to require strong, moderate, or promising programs (defined in the law) for low-achieving schools seeking school improvement funding. Nevada has made it clear to its local districts that it will enforce the federal definitions rigorously, and only approve school improvement funding for schools proposing to implement proven programs appropriate to their needs. The federal ESSA law also provides bonus points on various other applications for federal funding, and Nevada will support these provisions as well.

However, Nevada will go beyond these policies, reasoning that if evidence from rigorous evaluations is good for federal funding, why shouldn’t it be good for state funding too? For example, Nevada will require ESSA-type evidence for its own funding program for very high-poverty schools, and for schools serving many English learners. The state has a reading-by-third-grade initiative that will also require use of programs proven to be effective under the ESSA regulations. For all of the discretionary programs offered by the state, NDE will create lists of ESSA-proven supplementary programs in each area in which evidence exists.

Nevada has even taken on the holy grail: Textbook adoption. It is not politically possible for the state to require that textbooks have rigorous evidence of effectiveness to be considered state approved. As in the past, texts will be state adopted if they align with state standards. However, on the state list of aligned programs, two key pieces of information will be added: the ESSA evidence level and the average effect size. Districts will not be required to take this information into account, but by listing it on the state adoption lists the state leaders hope to alert district leaders to pay attention to the evidence in making their selections of textbooks.

The Nevada focus on evidence takes courage. NDE has been deluged with concern from districts, from vendors, and from providers of professional development services. To each, NDE has made the same response: we need to move our state toward use of programs known to work. This is worth undergoing the difficult changes to new partnerships and new materials, if it provides Nevada’s children better programs, which will translate into better achievement and a chance at a better life. Seng-Dao Keo describes the evidence movement in Nevada as a moral imperative, delivering proven programs to Nevada’s children and then working to see that they are well implemented and actually produce the outcomes Nevada expects.

Perhaps other states are making similar plans. I certainly hope so, but it is heartening to see one state, at least, willing to use the ESSA standards as they were intended to be used, as a rationale for state and local educators not just to meet federal mandates, but to move toward use of proven programs. If other states also do this, it could drive publishers, software producers, and providers of professional development to invest in innovation and rigorous evaluation of promising approaches, as it increases use of approaches known to be effective now.

NDE is not just rolling the dice and hoping for the best. It is actively educating its district and school leaders on the benefits of evidence-based reform, and helping them make wise choices. With a proper focus on assessments of needs, facilitating access to information, and assistance with ensuring high quality implementation, really promoting use of proven programs should be more like Nevada’s Hoover Dam: A sure thing.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Photo by: Michael Karavanov [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

Evidence for ESSA Celebrates its First Anniversary

Penguin 02 22 18On February 28, 2017 we launched Evidence for ESSA (www.evidenceforessa.org), our website providing the evidence to support educational programs according to the standards laid out in the Every Child Succeeds Act in December, 2015.

Evidence for ESSA began earlier, of course. It really began one day in September, 2016, when I heard leaders of the Institute for Education Sciences (IES) and the What Works Clearinghouse (WWC) announce that the WWC would not be changed to align with the ESSA evidence standards. I realized that no one else was going to create scientifically valid, rapid, and easy-to-use websites providing educators with actionable information on programs meeting ESSA standards. We could do it because our group at Johns Hopkins University, and partners all over the world, had been working for many years creating and updating another website, the Best Evidence Encyclopedia (BEE; www.bestevidence.org).BEE reviews were not primarily designed for practitioners and they did not align with ESSA standards, but at least we were not starting from scratch.

We assembled a group of large membership organizations to advise us and to help us reach thoughtful superintendents, principals, Title I directors, and others who would be users of the final product. They gave us invaluable advice along the way. We also assembled a technical working group (TWG) of distinguished researchers to advise us on key decisions in establishing our website.

It is interesting to note that we have not been able to obtain adequate funding to support Evidence for ESSA. Instead, it is mostly being written by volunteers and graduate students, all of whom are motivated only by a passion for evidence to improve the education of students.

A year after launch, Evidence for ESSA has been used by more than 36,000 unique users, and I hear that it is very useful in helping states and districts meet the ESSA evidence standards.

We get a lot of positive feedback, as well as complaints and concerns, to which we try to respond rapidly. Feedback has been important in changing some of our policies and correcting some errors and we are glad to get it.

At this moment we are thoroughly up-to-date on reading and math programs for grades pre-kindergarten to 12, and we are working on science, writing, social-emotional outcomes, and summer school. We are also continuing to update our more academic BEE reviews, which draw from our work on Evidence for ESSA.

In my view, the evidence revolution in education is truly a revolution. If the ESSA evidence standards ultimately prevail, education will at long last join fields such as medicine and agriculture in a dynamic of practice to development to evaluation to dissemination to better practice, in an ascending spiral that leads to constantly improving practices and outcomes.

In a previous revolution, Thomas Jefferson said, “If I had to choose between government without newspapers and newspapers without government, I’d take the newspapers.” In our evidence revolution in education, Evidence for ESSA, the WWC, and other evidence sources are our “newspapers,” providing the information that people of good will can use to make wise and informed decisions.

Evidence for ESSA is the work of many dedicated and joyful hands trying to provide our profession with the information it needs to improve student outcomes. The joy in it is the joy in seeing teachers, principals, and superintendents see new, attainable ways to serve their children.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Evidence-Based Does Not Equal Evidence-Proven

Chemist

As I speak to educational leaders about using evidence to help them improve outcomes for students, there are two words I hear all the time that give me the fantods (as Mark Twain would say):

Evidence-based

            I like the first word, “evidence,” just fine, but the second word, “based,” sort of negates the first one. The ESSA evidence standards require programs that are evidence-proven, not just evidence-based, for various purposes.

“Evidence-proven” means that a given program, practice, or policy has been put to the test. Ideally, students, teachers, or schools have been assigned at random to use the experimental program or to remain in a control group. The program is provided to the experimental group for a significant period of time, at least a semester, and then final performance on tests that are fair to both groups are compared, using appropriate statistics.

If your doctor gives you medicine, it is evidence proven. It isn’t just the same color or flavor as something proven, it isn’t just generally in line with what research suggests might be a good idea. Instead, it has been found to be effective, compared to current standards of care, in rigorous studies.

“Evidence-based,” on the other hand, is one of those wiggle words that educators love to use to indicate that they are up-to-date and know what’s expected, but don’t actually intend to do anything different from what they are doing now.

Evidence-based is today’s equivalent of “based on scientifically-based research” in No Child Left Behind. It sure sounded good, but what educational program or practice can’t be said to be “based on” some scientific principle?

In a recent Brookings article Mark Dynarski wrote about state ESSA plans, and conversations he’s heard among educators. He says that the plans are loaded with the words “evidence-based,” but with little indication of what specific proven programs they plan to implement, or how they plan to identify, disseminate, implement, and evaluate them.

I hope the ESSA evidence standards give leaders in even a few states the knowledge and the courage to insist on evidence-proven programs, especially in very low-achieving “school improvement” schools that desperately need the very best approaches. I remain optimistic that ESSA can be used to expand evidence-proven practices. But will it in fact have this impact? That remains to be proven.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.