Nevada Places Its Bets on Evidence

blog_3-29-18_HooverDam_500x375In Nevada, known as the land of big bets, taking risks is what they do. The Nevada State Department of Education (NDE) is showing this in its approach to ESSA evidence standards .  Of course, many states are planning policies to encourage use of programs that meet the ESSA evidence standards, but to my knowledge, no state department of education has taken as proactive a stance in this direction as Nevada.

 

Under the leadership of their state superintendent, Steve Canavero, Deputy Superintendent Brett Barley, and Director of the Office of Student and School Supports Seng-Dao Keo, Nevada has taken a strong stand: Evidence is essential for our schools, they maintain, because our kids deserve the best programs we can give them.

All states are asked by ESSA to require strong, moderate, or promising programs (defined in the law) for low-achieving schools seeking school improvement funding. Nevada has made it clear to its local districts that it will enforce the federal definitions rigorously, and only approve school improvement funding for schools proposing to implement proven programs appropriate to their needs. The federal ESSA law also provides bonus points on various other applications for federal funding, and Nevada will support these provisions as well.

However, Nevada will go beyond these policies, reasoning that if evidence from rigorous evaluations is good for federal funding, why shouldn’t it be good for state funding too? For example, Nevada will require ESSA-type evidence for its own funding program for very high-poverty schools, and for schools serving many English learners. The state has a reading-by-third-grade initiative that will also require use of programs proven to be effective under the ESSA regulations. For all of the discretionary programs offered by the state, NDE will create lists of ESSA-proven supplementary programs in each area in which evidence exists.

Nevada has even taken on the holy grail: Textbook adoption. It is not politically possible for the state to require that textbooks have rigorous evidence of effectiveness to be considered state approved. As in the past, texts will be state adopted if they align with state standards. However, on the state list of aligned programs, two key pieces of information will be added: the ESSA evidence level and the average effect size. Districts will not be required to take this information into account, but by listing it on the state adoption lists the state leaders hope to alert district leaders to pay attention to the evidence in making their selections of textbooks.

The Nevada focus on evidence takes courage. NDE has been deluged with concern from districts, from vendors, and from providers of professional development services. To each, NDE has made the same response: we need to move our state toward use of programs known to work. This is worth undergoing the difficult changes to new partnerships and new materials, if it provides Nevada’s children better programs, which will translate into better achievement and a chance at a better life. Seng-Dao Keo describes the evidence movement in Nevada as a moral imperative, delivering proven programs to Nevada’s children and then working to see that they are well implemented and actually produce the outcomes Nevada expects.

Perhaps other states are making similar plans. I certainly hope so, but it is heartening to see one state, at least, willing to use the ESSA standards as they were intended to be used, as a rationale for state and local educators not just to meet federal mandates, but to move toward use of proven programs. If other states also do this, it could drive publishers, software producers, and providers of professional development to invest in innovation and rigorous evaluation of promising approaches, as it increases use of approaches known to be effective now.

NDE is not just rolling the dice and hoping for the best. It is actively educating its district and school leaders on the benefits of evidence-based reform, and helping them make wise choices. With a proper focus on assessments of needs, facilitating access to information, and assistance with ensuring high quality implementation, really promoting use of proven programs should be more like Nevada’s Hoover Dam: A sure thing.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Photo by: Michael Karavanov [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
Advertisements

Evidence for ESSA Celebrates its First Anniversary

Penguin 02 22 18On February 28, 2017 we launched Evidence for ESSA (www.evidenceforessa.org), our website providing the evidence to support educational programs according to the standards laid out in the Every Child Succeeds Act in December, 2015.

Evidence for ESSA began earlier, of course. It really began one day in September, 2016, when I heard leaders of the Institute for Education Sciences (IES) and the What Works Clearinghouse (WWC) announce that the WWC would not be changed to align with the ESSA evidence standards. I realized that no one else was going to create scientifically valid, rapid, and easy-to-use websites providing educators with actionable information on programs meeting ESSA standards. We could do it because our group at Johns Hopkins University, and partners all over the world, had been working for many years creating and updating another website, the Best Evidence Encyclopedia (BEE; www.bestevidence.org).BEE reviews were not primarily designed for practitioners and they did not align with ESSA standards, but at least we were not starting from scratch.

We assembled a group of large membership organizations to advise us and to help us reach thoughtful superintendents, principals, Title I directors, and others who would be users of the final product. They gave us invaluable advice along the way. We also assembled a technical working group (TWG) of distinguished researchers to advise us on key decisions in establishing our website.

It is interesting to note that we have not been able to obtain adequate funding to support Evidence for ESSA. Instead, it is mostly being written by volunteers and graduate students, all of whom are motivated only by a passion for evidence to improve the education of students.

A year after launch, Evidence for ESSA has been used by more than 36,000 unique users, and I hear that it is very useful in helping states and districts meet the ESSA evidence standards.

We get a lot of positive feedback, as well as complaints and concerns, to which we try to respond rapidly. Feedback has been important in changing some of our policies and correcting some errors and we are glad to get it.

At this moment we are thoroughly up-to-date on reading and math programs for grades pre-kindergarten to 12, and we are working on science, writing, social-emotional outcomes, and summer school. We are also continuing to update our more academic BEE reviews, which draw from our work on Evidence for ESSA.

In my view, the evidence revolution in education is truly a revolution. If the ESSA evidence standards ultimately prevail, education will at long last join fields such as medicine and agriculture in a dynamic of practice to development to evaluation to dissemination to better practice, in an ascending spiral that leads to constantly improving practices and outcomes.

In a previous revolution, Thomas Jefferson said, “If I had to choose between government without newspapers and newspapers without government, I’d take the newspapers.” In our evidence revolution in education, Evidence for ESSA, the WWC, and other evidence sources are our “newspapers,” providing the information that people of good will can use to make wise and informed decisions.

Evidence for ESSA is the work of many dedicated and joyful hands trying to provide our profession with the information it needs to improve student outcomes. The joy in it is the joy in seeing teachers, principals, and superintendents see new, attainable ways to serve their children.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Evidence-Based Does Not Equal Evidence-Proven

Chemist

As I speak to educational leaders about using evidence to help them improve outcomes for students, there are two words I hear all the time that give me the fantods (as Mark Twain would say):

Evidence-based

            I like the first word, “evidence,” just fine, but the second word, “based,” sort of negates the first one. The ESSA evidence standards require programs that are evidence-proven, not just evidence-based, for various purposes.

“Evidence-proven” means that a given program, practice, or policy has been put to the test. Ideally, students, teachers, or schools have been assigned at random to use the experimental program or to remain in a control group. The program is provided to the experimental group for a significant period of time, at least a semester, and then final performance on tests that are fair to both groups are compared, using appropriate statistics.

If your doctor gives you medicine, it is evidence proven. It isn’t just the same color or flavor as something proven, it isn’t just generally in line with what research suggests might be a good idea. Instead, it has been found to be effective, compared to current standards of care, in rigorous studies.

“Evidence-based,” on the other hand, is one of those wiggle words that educators love to use to indicate that they are up-to-date and know what’s expected, but don’t actually intend to do anything different from what they are doing now.

Evidence-based is today’s equivalent of “based on scientifically-based research” in No Child Left Behind. It sure sounded good, but what educational program or practice can’t be said to be “based on” some scientific principle?

In a recent Brookings article Mark Dynarski wrote about state ESSA plans, and conversations he’s heard among educators. He says that the plans are loaded with the words “evidence-based,” but with little indication of what specific proven programs they plan to implement, or how they plan to identify, disseminate, implement, and evaluate them.

I hope the ESSA evidence standards give leaders in even a few states the knowledge and the courage to insist on evidence-proven programs, especially in very low-achieving “school improvement” schools that desperately need the very best approaches. I remain optimistic that ESSA can be used to expand evidence-proven practices. But will it in fact have this impact? That remains to be proven.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

“We Don’t Do Lists”

blog218_Santa_500x332 (2)

Watching the slow, uneven, uncertain rollout of the ESSA evidence standards gives me a mixture of hope and despair. The hope stems from the fact that from coast to coast, educational leaders are actually talking about proven programs and practices at all. That was certainly rare before ESSA. But despair in that I hear many educational leaders trying to find the absolute least their states and districts can do to just barely comply with the law. The ESSA evidence standards apply in particular to schools seeking school improvement funding, which are those in the lowest 5% of their states in academic performance. A previous program with a similar name but more capital letters, School Improvement, was used under NCLB, before ESSA. A large-scale evaluation by MDRC found that the earlier School Improvement made no difference in student achievement, despite billions of dollars in investments. So you’d imagine that this time around, educators responsible for school improvement would be eager to use the new law to introduce proven programs into their lowest-achieving schools. In fact, there are individual leaders, districts, and states who have exactly this intention, and may ultimately provide good examples to the rest. But they face substantial obstacles.

One of the obstacles I hear about often is an opposition among state departments of education to disseminating lists of proven programs. I very much understand and sympathize with their reluctance, as schools have been over-regulated for a long time. However, I do not see how the ESSA evidence standards can make much of a difference if everyone makes their own list of programs. Determining which studies meet ESSA evidence standards is difficult, and requires a great deal of knowledge about research (I know this, of course, because we do such reviews ourselves; see www.evidenceforessa.org).

Some say that they want programs that have been evaluated in their own states. But after taking into account demographics (e.g., urban/rural, ELL/not ELL, etc), are state-to-state differences so great as to require different research in each? We used to work with a school located on the Ohio-Indiana border, which ran right through the building. Were there really programs that were effective on one side of the building but not on the other?

Further, state department leaders frequently complain that they have too few staff to adequately manage school improvement across their states. Should that capacity be concentrated on reviewing research to determine which programs meet ESSA evidence standards and which do not?

The irony of opposing lists for ESSA evidence standards is that most states are chock full of lists that restrict the textbooks, software, and professional development schools can select using state funds. These lists may focus on paperweight, binding, and other minimum quality issues, but they almost never have anything to do with evidence of effectiveness. One state asked us to review their textbook adoption lists for reading and math, grades K-12. Collectively, there were hundreds of books, but just a handful had even a shred of evidence of effectiveness.

Educational leaders are constantly buffeted by opposing interest groups, from politicians to school board members to leaders of unions, from PTAs presidents to university presidents, to for-profit companies promoting their own materials and programs. Educational leaders need a consistent way to ensure that the decisions they make are in the best interests of children, not the often self-serving interests of adults. The ESSA evidence standards, if used wisely, give education leaders an opportunity to say to the whole cacophony of cries for special consideration, “I’d love to help you all, but we can only approve programs for our lowest-achieving schools that are known from rigorous research to benefit our children. We say this because it is the law, but also because we believe our children, and especially our lowest achievers, deserve the most effective programs, no matter what the law says.”

To back up such a radical statement, educational leaders need clarity about what their standards are and which specific programs meet those standards. Otherwise, they either have an “anything goes’ strategy that in effect means that evidence does not matter, or they have competing vendors claiming an evidence base for their favored program. Lists of proven programs can disappoint those whose programs aren’t on the list, but they are at least clear and unambiguous, and communicate to those who want to add to the list exactly what kind of evidence they will need.

States or large districts can create lists of proven programs by starting with existing national lists (such as the What Works Clearinghouse or Evidence for ESSA) and then modifying them, perhaps by adding additional programs that meet the same standards and/or eliminating programs not available in a given location. Over time, existing or new programs can be added as new evidence appears. We, at Evidence for ESSA, are willing to review programs being considered by state or local educators for addition to their own lists, and we will do it for free and in about two weeks. Then we’ll add them to our national list if they qualify.

It is important to say that while lists are necessary, they are not sufficient. Thoughtful needs assessments, information on proven programs (such as effective methods fairs and visits to local users of proven programs), and planning for high-quality implementation of proven programs are also necessary. However, students in struggling schools cannot wait for every school, district, and state to reinvent the wheel. They need the best we can give them right now, while the field is working on even better solutions for the future.

Whether a state or district uses a national list, or starts with such a list and modifies it for its own purposes, a list of proven programs provides an excellent starting point for struggling schools. It plants a flag for all to see, one that says “Because this (state/district/school) is committed to the success of every child, we select and carefully implement programs known to work. Please join us in this enterprise.”

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

“Substantively Important” Isn’t Substantive. It Also Isn’t Important

Since it began in 2002, the What Works Clearinghouse has played an important role in finding, rating, and publicizing findings of evaluations of educational programs. It performs a crucial function for evidence-based reform. For this very reason, it needs to be right. But in several important ways, it uses procedures that are indefensible and have a big impact on its conclusions.

One of these relates to a study rating called “substantively important-positive.” This refers to study outcomes with an effect size of at least +0.25, but that are not statistically significant. I’ve written about this before, but the WWC has recently released a database of information on its studies that makes it easy to analyze WWC data on a large scale, and we have learned a lot more about this topic.

Study outcomes rated as “substantively important – positive” can qualify a study as “potentially positive,” the second-highest WWC rating. “Substantively important-negative” findings (non-significant effect sizes less than -0.25) can cause a study to be rated as potentially negative, which can keep a study from getting a positive rating forever, as a single “potentially negative” rating, under current rules, ensures that a program can never receive a rating better than “mixed,” even if other studies found hundreds of significant positive effects.

People who follow the WWC and know about “substantively important” may assume that it may be a strange rule, but relatively rare in practice. But that is not true.

My graduate student, Amanda Inns, has just done an analysis of WWC data from their own database, and if you are a big fan of the WWC, this is going to be a shock. Amanda has looked at all WWC-accepted reading and math studies. Among these, she found a total of 339 individual outcomes rated “positive” or “potentially positive.” Of these, 155 (46%) reached the “potentially positive” level only because they had effect sizes over +0.25, but were not statistically significant.

Another 36 outcomes were rated “negative” or “potentially negative.” 26 of these (72%) were categorized as “potentially negative” only because they had effect sizes less than -0.25 and were not significant. I’m sure patterns would be similar for subjects other than reading and math.

Put another way, almost half (48%) of outcomes rated positive/potentially positive or negative/potentially negative by the WWC were not statistically significant. As one example of what I’m talking about, consider a program called The Expert Mathematician. It had just one study with only 70 students in 4 classrooms (2 experimental and 2 control). The WWC re-analyzed the data to account for clustering, and the outcomes were nowhere near statistically significant, though they were greater than +0.25. This tiny study, and this study alone, caused The Expert Mathematician to receive the WWC “potentially positive” rating and to be ranked seventh among all middle school math programs. Similarly, Waterford Early Learning received a “potentially positive” rating based on a single tiny study with only 70 kindergarteners in 6 schools. The outcomes ranged from -0.71 to +1.11, and though the mean was more than +0.25, the outcome was far from significant. Yet this study alone put Waterford on the WWC list of proven kindergarten programs.

I’m not taking any position on whether these particular programs are in fact effective. All I am saying is that these very small studies with non-significant outcomes say absolutely nothing of value about that question.

I’m sure that some of you nerdier readers who have followed me this far are saying to yourselves, “well, sure, these substantively important studies may not be statistically significant, but they are probably unbiased estimates of the true effect.”

More bad news. They are not. Not even close.

The problem, also revealed in Amanda Inns’ data, is that studies with large effect sizes but not statistical significance tend to have very small sample sizes (otherwise, they would have been significant). Across WWC reading and math studies that used individual-level assignment, median sample sizes were 48, 74, or 86, for substantively important, significant, or indeterminate (non-significant with ES < +0.25), respectively. For cluster studies, they were 10, 17, and 33 clusters respectively. In other words, “substantively important” outcomes averaged less than half the sample sizes of other outcomes.

And small-sample studies greatly overstate effect sizes. Among all factors that bias effect sizes, small sample size is the most important (only use of researcher/developer-made measures comes close). So a non-significant positive finding in a small study is not an unbiased point estimate that just needs a larger sample to show its significance. It is probably biased, in a consistent, positive direction. Studies with sample sizes less than 100 have about three times the mean effect sizes of studies with sample sizes over 1000, for example.

But “substantively important” ratings can throw a monkey wrench into current policy. The ESSA evidence standards require statistically significant effects for all of its top three levels (strong, moderate, and promising). Yet many educational leaders are using the What Works Clearinghouse as a guide to which programs will meet ESSA evidence standards. They may logically assume that if the WWC says a program is effective, then the federal government stands behind it, regardless of what the ESSA evidence standards actually say. Yet in fact, based on the data analyzed by Amanda Inns for reading and math, 46% of the outcomes rated as positive/potentially positive by WWC (taken to correspond to “strong” or “moderate,” respectively, under ESSA evidence standards) are non-significant, and therefore do not qualify under ESSA.

The WWC needs to remove “substantively important” from its ratings as soon as possible, to avoid a collision with ESSA evidence standards, and to avoid misleading educators any further. Doing so would help make the WWC’s impact on ESSA substantive. And important.

 

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

The Mystery of the Chinese Dragon: Why Isn’t the WWC Up to Date?

As evidence becomes more important in educational practice and policy, it is increasingly critical that it be up-to-date. This sounds obvious. Of course we’d prefer evidence from recent studies, which are more likely to have been done under social and political conditions like those that exist today, using standards like those prevailing today.

However, there are reasons that up-to-date evidence is especially important in today’s policy environment. Up-to-date evidence is critical because it is far more likely than earlier research to meet very high methodological standards. Because of substantial investments by the U.S. Department of Education and others, there has been an outpouring of top-quality, randomized, usually third-party evaluations of programs for all subjects and grade levels, published from 2012 to the present.

The reason this matters in practice is that to satisfy ESSA evidence standards, many educators are using the What Works Clearinghouse (WWC) to identify proven programs. And the What Works Clearinghouse is very slow in reviewing studies, and therefore does not contain many of the latest, and therefore highest-quality studies.

The graph below illustrates the problem. It compares all secondary literacy (grades 6-12) studies reported by the WWC as of fall, 2017, on the orange line. The blue line represents a review of research on the same topic by Baye et al. (2017; see www.bestevidence.org). I think the graph resembles a Chinese dragon, with its jaws wide open and a long tail. The sort of dragon you see in Chinese New Year’s parades.

 

 

What the graph shows is that while the number of studies published up to 2009 were about equal for Baye et al. and WWC, they diverged sharply in 2010 (thus the huge open jaws). Baye et al. reported on 58 studies published in 2010 to 2017. WWC reported on only 6, and none at all from 2016 or 2017.

The same patterns are apparent throughout the WWC. Across every topic and grade level, the WWC has only 7 accepted studies from 2014, 7 from 2015, zero from 2016, and zero from 2017.

It is likely that every one of the Baye et al. studies would meet WWC standards. Yet the WWC has just not gotten to them.

It’s important to note that the What Works Clearinghouse is plenty busy. Recent studies are often included in Quick Reviews, Single Study Reviews, Grant Competition Reports, and Practice Guides. However, an educator going to the WWC for guidance on what works will go to Find What Works and click on one of the 12 topic areas, which will list programs. They then may filter their search and go to intervention reports.

These intervention reports are not integrated with Quick Reviews, Single Study Reviews, Grant Competition Reports, or Practice Guides, so the user has no easy way to find out about more recent evaluations, if they in fact appear anywhere in any of these reports. Even if users did somehow find additional information on a program in one of these supplemental reports, the information may be incomplete. In many cases, the supplemental report only notes whether a study meets WWC standards, but does not provide any information about what the outcome was.

The slow pace of the WWC reviews is problematic for many reasons. In addition to missing out on the strongest and most recent studies, the WWC does not register changes in the evidence base for programs already in its database. New programs may not appear at all, leaving readers to wonder why.

Any website developer knows that if users go to a website and are unable to find what they expect to find, they are unlikely to come back. The WWC is a website, and it cannot expect many users to check back every few months to see if programs that interest them, which they know to exist, have been added lately.

In the context of the ESSA evidence standards, the slow pace of the WWC is particularly disturbing. Although the WWC has chosen not to align itself with ESSA standards, many educators use the WWC as a guide to which programs are likely to meet ESSA standards. Failing to keep the WWC up to date may convince many users seeking ESSA information that there are few programs meeting either WWC or ESSA standards.

Educators need accurate, up-to-date information to make informed choices for their students. I hope the WWC will move quickly to provide its readers with essential, useful data on today’s evidence supporting today’s programs. It’s going to have to catch up with the Chinese dragon, or be left to watch the parade going by.

Evidence and Freedom

In 1776, a small group of American patriots had a vision of a government of, by, and for the people, and they risked their lives to make it so. Their commitment to liberty was not just ideological, it was also pragmatic. They knew that people who were empowered to make their own decisions were more likely to be committed to the implementation of those decisions. The same should apply to education today.

One of the most important aspects of the Every Student Succeeds Act (ESSA) is how it balances evidence with freedom. The Act defines proven programs and mentions evidence 60 times. It encourages use of proven programs throughout. It provides for additional preference points for proposals in seven areas that meet evidence requirements. Yet only in the area of school improvement for the lowest 5% of schools does it require use of proven programs. This is probably a good thing.

Americans, even more than other people, don’t like to be told what to do. If the evidence movement turns into a set of mandates, telling educators which programs they can or cannot implement, it will probably be doomed. Even when evidence for or against given programs is solid and widely replicated, many political forces opposing evidence-based reform would surely come into play if educators felt compelled to use certain programs and avoid others.

Years ago, I had an experience that reinforced my view that teachers respond better to proven practices if they are free to choose them. I was doing a cooperative learning workshop in a large urban district. A surly-looking teacher raised her hand. “Do we have to do this?” she asked. “Of course not” I answered. “These are ideas for you to use or not, as you wish”

“In this district,” said the teacher “if we’re not required to use something, we’re not allowed to do it.”

How can we avoid compulsion? The answer is easy. Federal, state, and local policies need to provide incentives for schools to use certain programs with strong evidence of effectiveness from rigorous experiments, but not mandates to do so. That’s what ESSA will do in several areas. Incentives may mean providing a few points on competitive grant proposals, or modest financial incentives, for schools that adopt proven programs. These incentives should be enough to get educators’ attention, but not enough to force them to pick a given program.

Incentives should cause educators to eagerly volunteer to use proven programs, to raise their hands, not their hackles. They could lead educators to learn more about the proven programs available to them and about the research process itself. This in turn could encourage political leaders to support education R & D, as educators and the public at large begin to clamor for more programs and better research.

Government cannot and should not try to get 3 million teachers in 100,000 schools in 14,000 districts to use any particular set of programs, no matter what their evidence of effectiveness. What it can and should do is set in motion policies that gradually expand the availability, adoption, and spread of proven programs, eventually pushing less effective approaches to improve or disappear. Development and evaluation of promising programs continues in ESSA, in the new Education Innovation Research (EIR), which along with R & D funded by other agencies will continuously add to the set of proven programs ready for adoption. As the number and quality of proven programs grow, educators will become more and more comfortable about using them.

From our nation’s founding, freedom to make informed choices has been an essential foundation stone of our system of governance. So it should be in education policy.

Evidence can inform key decisions for children, and government can encourage and incent adoption of proven programs. However, educators need the freedom to do what is right for their children, guided but not steered by valid and useful research.